Unix期望脚本:解析HTTP输出

时间:2012-11-14 08:01:29

标签: expect

我有一个期望脚本试图从whatismyip网站获取主页。我需要捕获 - 站点的IP和HTTP返回码:

#!/usr/bin/expect -f
set timeout -1
spawn telnet www.whatismyip.com 80
expect "Connected to www.whatismyip.com*"
set output $expect_out(0,string)
regexp {Connected to www\.whatismyip\.com.*?(\d+\.\d+\.\d+\.\d+)} $output match ip
send -- "GET / HTTP/1.0\n"
send -- "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4\n"
send -- "Host: www.whatismyip.com\n"
send -- "\n"
send -- "\n"
set output $expect_out(buffer)
regexp {.*HTTP/1.1 200 OK.*} $output match ret
puts $ip
puts $ret
expect eof
exit 0

有两个问题。首先,我将IP截断为最后一个字符,并获得找不到变量ret的错误:

spawn telnet www.whatismyip.com 80
Trying 108.162.200.37...
Connected to www.whatismyip.com (108.162.200.37).
Escape character is '^]'.
108.162.200.3
can't read "ret": no such variable
    while executing
"puts $ret"
    (file "./t2" line 15)

我尝试了所有方式和可能性,但无法纠正它们。请让我知道如何纠正这个问题。

1 个答案:

答案 0 :(得分:0)

第一个问题:由于你无法控制$ expect_out中的*是什么(想象这些字符缓慢出现并注意到“已连接到www.whatismyip.com *”已经匹配“已连接到www” .whatismyip.com(108.16“。相反使用:

set myexpr {Connected to www\.whatismyip\.com.*?(\d+\.\d+\.\d+\.\d+)[^0-9]}; #Note the terminal condition!
expect {
   -re $myexpr {
         #now $expect_out(0,string) contains the right data to dig...
         regexp $myexpr $expect_out(0,string) match ip
   }
}

第二个问题:请注意regexp {.*HTTP/1.1 200 OK.*} $output match ret中的表达式不包含括号,因此即使输出包含该字符串,$ret也永远不会被填充,但我认为$输出无论如何都是空的,为什么?

与第一期相同。想象一下,当你执行set output $expect_out(buffer)时尚未接收到字符时,字符会缓慢地出现(脚本本身通常要快得多,通过网络传输数据并且缓冲区在数据发送后立即设置,没有等待回应)。再次使用expect:

expect {
   "HTTP/1.1 200 OK" { 
        #do some stuff here ...
   }
}