使用curl和grep / sed来提取值

时间:2014-06-10 16:15:03

标签: bash curl sed grep

我编写了一个生成URL数组的脚本。我想打开这些网址并提取最低价格。我尝试过:

curl http://www.orbitz.com/shop/home?type=air&ar.rt.numAdult=1&ar.rt.numChild=0&_ar.rt.narrowSel=0&search=Search+Flights&ar.rt.child[2]=&ar.rt.leaveSlice.orig.key=las&strm=true&ar.rt.child[6]=&ar.rt.numSenior=0&ar.rt.narrow=airlines&ar.rt.carriers[2]=&ar.rt.cabin=C&_ar.rt.nonStop=0&ar.rt.child[3]=&ar.rt.child[7]=&_ar.rt.leaveSlice.originRadius=0&ar.rt.carriers[1]=&ar.rt.returnSlice.time=Anytime&ar.rt.child[4]=&ar.rt.child[0]=&_ar.rt.leaveSlice.destinationRadius=0&ar.rt.leaveSlice.time=Anytime&ar.rt.carriers[0]=&ar.rt.returnSlice.date=09%2F24%2F14&ar.rt.leaveSlice.date=09%2F23%2F14&ar.rt.leaveSlice.dest.key=lax&_ar.rt.flexAirSearch=0&ar.type=roundTrip&ar.rt.child[5]=&ar.rt.child[1]=|grep \"div class='basePrice '\"

但始终获得全部内容。我也尝试过各种各样的sed组合,但也没有用。我怎样才能获得最低价格或至少列出所有价格?

2 个答案:

答案 0 :(得分:0)

首先,您需要正确引用它:

curl 'http://www.orbitz.com/shop/home?type=air&ar.rt.numAdult=1&ar.rt.numChild=0&_ar.rt.narrowSel=0&search=Search+Flights&ar.rt.child[2]=&ar.rt.leaveSlice.orig.key=las&strm=true&ar.rt.child[6]=&ar.rt.numSenior=0&ar.rt.narrow=airlines&ar.rt.carriers[2]=&ar.rt.cabin=C&_ar.rt.nonStop=0&ar.rt.child[3]=&ar.rt.child[7]=&_ar.rt.leaveSlice.originRadius=0&ar.rt.carriers[1]=&ar.rt.returnSlice.time=Anytime&ar.rt.child[4]=&ar.rt.child[0]=&_ar.rt.leaveSlice.destinationRadius=0&ar.rt.leaveSlice.time=Anytime&ar.rt.carriers[0]=&ar.rt.returnSlice.date=09%2F24%2F14&ar.rt.leaveSlice.date=09%2F23%2F14&ar.rt.leaveSlice.dest.key=lax&_ar.rt.flexAirSearch=0&ar.type=roundTrip&ar.rt.child[5]=&ar.rt.child[1]=' | \
    grep "div class='basePrice '"

也许你的grep命令确实应该是:

grep 'div class="basePrice'

答案 1 :(得分:0)

你可能应该在sed和grep上使用html解析器。

http://blog.codinghorror.com/parsing-html-the-cthulhu-way/