Web刮刮成Applecript变量

时间:2015-11-25 05:50:37

标签: java curl applescript

我住在距离我每天穿过一条运河的一座桥的2分钟内。有一个网站显示了船只时间表,这些时间表很难搞清楚变数。我打算在我的Indigo家庭自动化系统和iFindStuff插件中使用这些变量来估计我是否会等待一座桥,所以我知道要采取另一种方式。

我的问题是如何把N&在此站点上列出的每个桥接S次定义的AppleScript变量。 http://www.greatlakes-seaway.com/R2/jsp/mNiaBrdgStatus_mb.jsp?language=E

我知道有很多不同的方法可以做到这一点,但我会尝试在后台运行的方法。

 do shell script "curl 'http://www.greatlakes-seaway.com/R2/jsp/mNiaBrdgStatus_mb.jsp?language=E' | sed -n '/0-9/,/NewPP/p' | sed -n '/^<tr/ s/^.*title=.\\([^\"]*\\).*$/\\1/p' | perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_);'" 

我无法得到任何结果,我不知道如何将其转化为变量。提前感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

解决这个问题非常有趣:-) 试试这个并阅读代码中的注释:

on run {}
    set resultSet to bridgeStatus()
end run

on bridgeStatus()
    -- Empty return list
    set bridgeStatusList to {}

    -- Getting the page content
    -- The web site had problems wiht answering cUrl! Pretending Safari works :-)
    set webContent to paragraphs of (do shell script "curl -A 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/xxx.x (KHTML like Gecko) Safari/12x.x' 'http://www.greatlakes-seaway.com/R2/jsp/mNiaBrdgStatus_mb.jsp?language=E'")

    -- Parsing the page content
    repeat with i from 2 to count webContent
        -- Only work with lines that contain the N (Northbound) or S (Southbound) info
        -- Also only collect the info if the line before contains the needed bridge info
        if (item i of webContent contains "&nbsp;&nbsp;S</td>" or item i of webContent contains "&nbsp;&nbsp;N</td>") and item (i - 1) of webContent contains "bridge" then
            -- work with the four lines of info we want and strip HTML from it
            -- and collect all info in a dictionary {bridge:xxx, nextArrival:xxx, bridgeStatus:xxx, subsequentArrival:xxx}
            set foundStatus to {bridge:stripHTML(item (i - 1) of webContent), nextArrival:stripHTML(item i of webContent), bridgeStatus:stripHTML(item (i + 1) of webContent), subsequentArrival:stripHTML(item (i + 2) of webContent)}
            -- fill the return list with the found info
            copy foundStatus to end of bridgeStatusList
        end if
    end repeat
    return bridgeStatusList
end bridgeStatus

on stripHTML(anyText)
    -- easy way to trash HTML-Code, thanks to http://stackoverflow.com/a/33771977/4081207
    return (do shell script "echo " & quoted form of ("<!DOCTYPE HTML PUBLIC><meta charset=\"UTF-8\">" & anyText) & " | sed 's#<br />##' | sed 's#&nbsp;&nbsp;# #' | textutil  -convert txt  -stdin -stdout | xargs")
end stripHTML

几分钟前我运行了这个脚本,它返回了这个列表:

{
 {bridge:"Lakeshore Rd St. Catharines (Bridge 1)", 
  nextArrival:"06:55 N", 
  bridgeStatus:"Available", 
  subsequentArrival:"07:15 S"}, 
 {bridge:"Carlton St. St. Catharines (Bridge 3A)", 
  nextArrival:"05:45 N", 
  bridgeStatus:"Unavailable (Fully Raised)", 
  subsequentArrival:"05:50 S"}, 
 {bridge:"Queenston St. St. Catharines (Bridge 4)", 
  nextArrival:"05:27 N", 
  bridgeStatus:"Available", 
  subsequentArrival:"06:20 S"}, 
 {bridge:"Glendale Ave. St. Catharines (Bridge 5)", 
  nextArrival:"06:25 S", 
  bridgeStatus:"Available", 
  subsequentArrival:"07:21 S"}, 
 {bridge:"Highway 20 Thorold (Bridge 11)", 
  nextArrival:"10:17 S", 
  bridgeStatus:"Available", 
  subsequentArrival:"10:59 N"}, 
 {bridge:"Main St. Port Colborne (Bridge 19)", 
  nextArrival:"08:35 N", 
  bridgeStatus:"Unavailable (--Work in Progress--)", 
  subsequentArrival:"09:25 N"}, 
 {bridge:"Mellanby Ave. Port Colborne (Bridge 19A)", 
  nextArrival:"06:17 S", 
  bridgeStatus:"Available", 
  subsequentArrival:"08:00 N"}, 
 {bridge:"Clarence St. Port Colborne (Bridge 21)", 
  nextArrival:"06:40 S", 
  bridgeStatus:"Available", 
  subsequentArrival:"07:53 N"}
}

我希望它有所帮助...我讨厌被困在交通中;-)迈克尔/汉堡