A ksh bug on "wait"

style-notepad len:1383 crease:23% color:2

WTF?

$ cat kshbug
{ return 0; } &
evil=$(/bin/true)   # XXX: works fine without this line
wait $!
echo $?

$ ksh kshbug
127

$ ksh --version
version         sh (AT&T Labs Research) 1993-12-28 r

The correct return code should be 0. Without the line of “eval=$(bin/true)” everything works fine. The problem happens only when

  1. Execute a function or a clause in background, and
  2. A subshell is invoked between the background execution and the “wait”, and
  3. An external command is executed in the subshell

I googled for a while, there’s no ksh bug report so far, workaround could be use output text for return code check instead. Note there’s a similar report for ksh on solaris but it’s not the identical issue.

Pdksh (public domain ksh) doesn’t have the problem. (See another bug).

Couldn’t figure out where to report this bug so gave up.

Update: this issue doesn’t happen on Ubuntu ksh version sh (AT&T Research) 93s+ 2008-01-31.

A ksh bug on "wait"

style-notepad len:1383 crease:23% color:2
Nov 9, 2009

WTF?

$ cat kshbug
{ return 0; } &
evil=$(/bin/true)   # XXX: works fine without this line
wait $!
echo $?

$ ksh kshbug
127

$ ksh --version
version         sh (AT&T Labs Research) 1993-12-28 r

The correct return code should be 0. Without the line of “eval=$(bin/true)” everything works fine. The problem happens only when

  1. Execute a function or a clause in background, and
  2. A subshell is invoked between the background execution and the “wait”, and
  3. An external command is executed in the subshell

I googled for a while, there’s no ksh bug report so far, workaround could be use output text for return code check instead. Note there’s a similar report for ksh on solaris but it’s not the identical issue.

Pdksh (public domain ksh) doesn’t have the problem. (See another bug).

Couldn’t figure out where to report this bug so gave up.

Update: this issue doesn’t happen on Ubuntu ksh version sh (AT&T Research) 93s+ 2008-01-31.

Connect to your corp VPN with vpnc

style-newspaper len:6214 crease:55% color:4

Vpnc 是一个开源的 VPN 客户端,可以用来连接 Cisco VPN 网关,在 VPS 上使用 vpnc 连接办公网络,可以实现在一个严格限制端口的办公网络里管理 VPS。我的 VPS 是 vpsvillage 的 32-bit Debian 系统,最近折腾了一通,笔记记录如下。

Step 1. 首先要保证系统装有 tun 模块,不巧的是,我的 VPS 上的 kernel 模块全是 64-bit 的版本,这个应该是操作系统安装脚本的问题,联系客服后得知他们提供有 32-bit 的 kernel module 包,并且有个脚本帮助完成安装。

wget ftp://ftp.grokthis.net/pub/linux/modules/install_modules.sh
mv /lib/modules/`uname -r` /lib/modules/`uname -r`.orig
sh install_modules.sh
depmod -a
modprobe tun

Step 2. 安装 vpnc

apt-get install vpnc

Step 3. 导出 VPN 网关的配置文件

公司机器都是 windows 已经装有 Cisco VPN Client,通常在 C:\Program Files\公司 \VPN Client\profiles 里面就能找到配置文件。vpnc 带有一个工具可以将 pcf 配置文件直接转换为 vpnc 的配置文件,工具默认安装在 /usr/share/vpnc/pcf2vpnc,是个 perl 脚本,依赖 LWP::Simple 模块,我的 VPS 上没有这个,perl -MCPAN -e ‘install LWP::Simple’ 等了很久也没完成,放弃。其实可直接根据 pcf 文件里面的内容,参考 /etc/vpnc/example.conf 写一个配置文件,只要 ——

Step 4. 解码配置文件中的 enc_GroupPwd

vpnc主页提供了个工具 cisco-decode 直接到那解码即可。

Step 5. 默认情况 vpnc 建立了 vpn 隧道之后会把默认网关和 /etc/resolv.conf 修改掉,如果这时候你在外网 ssh 在鼓捣这个,那 ssh 连接就会断了并且再也连不上,要用到两个配置:Target networksDNSUpdate。完整的 vpnc 配置文件示例:

IPSec gateway 12.34.56.78
IPSec ID groupName
IPSec secret groupPassword
Xauth username myUsername
Xauth password myPassword
Target networks 10.0.0.0/8 192.168.0.0/16
DNSUpdate no

Step 6. vpnc 需要 root 权限,即使把它 chmod u+s 也没有用,因为它使用的一个库 libgcrypt 会在 init 的时候放弃 euid root 权限,导致不能对 tun0 设备进行 ioctl。

Step 7. 在严格限制端口的办公网络内部,在 VPS 上放一个 cgi 程序来建立 VPN 隧道即可穿透端口限制,但是通常 web 服务都是普通用户(如 www-data, nobody)运行,要调用 vpnc,需要写个 wrapper

/* VPNC.c */
#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    /* gcry_control from libgcrypt drops root euid privilege in vpnc.c */
    setuid(0);
    return execve("/usr/sbin/vpnc", argv, NULL);
}

然后

gcc -o VPNC VPNC.c
cp VPNC /usr/sbin/
chown root:www-data /usr/sbin/VPNC
chmod 4710 /usr/sbin/VPNC

在 cgi 程序中调用 VPNC 才能拥有 root 权限。vpnc-disconnect 也是一样。(VPN.c, VPNC-disconnect.c)

Step 8. cgi 程序的写法就各显神通了,我的 cgi 程序 是 bash 脚本,可以 URL/vpnc/q 查询 IP,URL/vpnc/<key> 建立连接,URL/vpnc/d 断开连接,相关的 lighttpd 配置(适应任何脚本):

$HTTP["url"] =~ "^/cgi-bin/.*" {
    cgi.assign = ( "" => "" )
}

参考资料:vpnc-howto.xml

Connect to your corp VPN with vpnc

style-newspaper len:6214 crease:55% color:4
May 13, 2009

Vpnc 是一个开源的 VPN 客户端,可以用来连接 Cisco VPN 网关,在 VPS 上使用 vpnc 连接办公网络,可以实现在一个严格限制端口的办公网络里管理 VPS。我的 VPS 是 vpsvillage 的 32-bit Debian 系统,最近折腾了一通,笔记记录如下。

Step 1. 首先要保证系统装有 tun 模块,不巧的是,我的 VPS 上的 kernel 模块全是 64-bit 的版本,这个应该是操作系统安装脚本的问题,联系客服后得知他们提供有 32-bit 的 kernel module 包,并且有个脚本帮助完成安装。

wget ftp://ftp.grokthis.net/pub/linux/modules/install_modules.sh
mv /lib/modules/`uname -r` /lib/modules/`uname -r`.orig
sh install_modules.sh
depmod -a
modprobe tun

Step 2. 安装 vpnc

apt-get install vpnc

Step 3. 导出 VPN 网关的配置文件

公司机器都是 windows 已经装有 Cisco VPN Client,通常在 C:\Program Files\公司 \VPN Client\profiles 里面就能找到配置文件。vpnc 带有一个工具可以将 pcf 配置文件直接转换为 vpnc 的配置文件,工具默认安装在 /usr/share/vpnc/pcf2vpnc,是个 perl 脚本,依赖 LWP::Simple 模块,我的 VPS 上没有这个,perl -MCPAN -e ‘install LWP::Simple’ 等了很久也没完成,放弃。其实可直接根据 pcf 文件里面的内容,参考 /etc/vpnc/example.conf 写一个配置文件,只要 ——

Step 4. 解码配置文件中的 enc_GroupPwd

vpnc主页提供了个工具 cisco-decode 直接到那解码即可。

Step 5. 默认情况 vpnc 建立了 vpn 隧道之后会把默认网关和 /etc/resolv.conf 修改掉,如果这时候你在外网 ssh 在鼓捣这个,那 ssh 连接就会断了并且再也连不上,要用到两个配置:Target networksDNSUpdate。完整的 vpnc 配置文件示例:

IPSec gateway 12.34.56.78
IPSec ID groupName
IPSec secret groupPassword
Xauth username myUsername
Xauth password myPassword
Target networks 10.0.0.0/8 192.168.0.0/16
DNSUpdate no

Step 6. vpnc 需要 root 权限,即使把它 chmod u+s 也没有用,因为它使用的一个库 libgcrypt 会在 init 的时候放弃 euid root 权限,导致不能对 tun0 设备进行 ioctl。

Step 7. 在严格限制端口的办公网络内部,在 VPS 上放一个 cgi 程序来建立 VPN 隧道即可穿透端口限制,但是通常 web 服务都是普通用户(如 www-data, nobody)运行,要调用 vpnc,需要写个 wrapper

/* VPNC.c */
#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    /* gcry_control from libgcrypt drops root euid privilege in vpnc.c */
    setuid(0);
    return execve("/usr/sbin/vpnc", argv, NULL);
}

然后

gcc -o VPNC VPNC.c
cp VPNC /usr/sbin/
chown root:www-data /usr/sbin/VPNC
chmod 4710 /usr/sbin/VPNC

在 cgi 程序中调用 VPNC 才能拥有 root 权限。vpnc-disconnect 也是一样。(VPN.c, VPNC-disconnect.c)

Step 8. cgi 程序的写法就各显神通了,我的 cgi 程序 是 bash 脚本,可以 URL/vpnc/q 查询 IP,URL/vpnc/<key> 建立连接,URL/vpnc/d 断开连接,相关的 lighttpd 配置(适应任何脚本):

$HTTP["url"] =~ "^/cgi-bin/.*" {
    cgi.assign = ( "" => "" )
}

参考资料:vpnc-howto.xml

TETware Infinite Loop: The ksh93 Bug That Filled My Hard Drive

style-newspaper len:2584 crease:26% color:2

While using TETware as our automated testing framework, I’ve increasingly found it to be incredibly frustrating. The ksh API portion, in particular, feels severely outdated. Despite making numerous local modifications, it remained clunky. Today, however, I uncovered an infinite loop hiding within one of its core logging interfaces. After diving deep into the issue, it turned out to be a native bug in ksh93. If this interface hadn’t been written so poorly in the first place, this shell bug might have remained hidden forever.

Here is the breakdown of the bug.

In ksh, the ${parameter%pattern} syntax is used to strip a suffix from a string, while ${parameter#pattern} strips a prefix. These are commonly used to extract directories and filenames from paths. However, when the parameter is a multi-line string and the pattern matches the \n.*(.*).* regex format, the shell parser completely fails:

$ cat ksh93bug
NL=$'\n'

PAT="$1"

A="Hello $NL$PAT"
echo "${A%$NL$PAT}"

A="$PAT$NL world"
echo "${A#$PAT$NL}"

$ ksh ksh93bug '()'
Hello
()
()
 world

$ ksh ksh93bug 'a(b)c'
Hello
a(b)c
a(b)c
 world

This bug is strictly isolated to AT&T’s ksh93, including the latest versions. Both bash and the public domain ksh (pdksh) handle it flawlessly:

$ bash ksh93bug '()'
Hello
 world

$ bash ksh93bug 'a(b)c'
Hello
 world

In our specific scenario, the code executed out=$(mount) followed by tet_infoline "$out". This immediately caused a freeze. TETware’s tetapi.ksh script relies on %% within a loop inside the tet_output function to process multi-line text. Because the suffix was never actually deleted due to the bug, the loop never terminated. When I attempted to debug the freeze by enabling set -x, the infinite loop generated logs so rapidly that it filled my entire hard drive to 100% capacity in seconds! :P

I initially intended to report this upstream, but after struggling to find a proper bug tracker on the ksh93 homepage, I gave up. For now, I’ve just patched our instance of TETware directly.

TETware Infinite Loop: The ksh93 Bug That Filled My Hard Drive

style-newspaper len:2584 crease:26% color:2
Mar 15, 2009

While using TETware as our automated testing framework, I’ve increasingly found it to be incredibly frustrating. The ksh API portion, in particular, feels severely outdated. Despite making numerous local modifications, it remained clunky. Today, however, I uncovered an infinite loop hiding within one of its core logging interfaces. After diving deep into the issue, it turned out to be a native bug in ksh93. If this interface hadn’t been written so poorly in the first place, this shell bug might have remained hidden forever.

Here is the breakdown of the bug.

In ksh, the ${parameter%pattern} syntax is used to strip a suffix from a string, while ${parameter#pattern} strips a prefix. These are commonly used to extract directories and filenames from paths. However, when the parameter is a multi-line string and the pattern matches the \n.*(.*).* regex format, the shell parser completely fails:

$ cat ksh93bug
NL=$'\n'

PAT="$1"

A="Hello $NL$PAT"
echo "${A%$NL$PAT}"

A="$PAT$NL world"
echo "${A#$PAT$NL}"

$ ksh ksh93bug '()'
Hello
()
()
 world

$ ksh ksh93bug 'a(b)c'
Hello
a(b)c
a(b)c
 world

This bug is strictly isolated to AT&T’s ksh93, including the latest versions. Both bash and the public domain ksh (pdksh) handle it flawlessly:

$ bash ksh93bug '()'
Hello
 world

$ bash ksh93bug 'a(b)c'
Hello
 world

In our specific scenario, the code executed out=$(mount) followed by tet_infoline "$out". This immediately caused a freeze. TETware’s tetapi.ksh script relies on %% within a loop inside the tet_output function to process multi-line text. Because the suffix was never actually deleted due to the bug, the loop never terminated. When I attempted to debug the freeze by enabling set -x, the infinite loop generated logs so rapidly that it filled my entire hard drive to 100% capacity in seconds! :P

I initially intended to report this upstream, but after struggling to find a proper bug tracker on the ksh93 homepage, I gave up. For now, I’ve just patched our instance of TETware directly.