Question

在批处理和xidel上的Windows 7上我在一个像这个例子的分页网站上测试：

LINK1

LINK2

LINK3

1 2 3 4 5 6 7 8 9 10下一步

我找到了获得前10个链接的方法：

xidel.exe https://www.website.es/search?q=xidel+follow+pagination^&start=0 --extract "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']"

但是当我尝试使用

关注第2页或第（n）页时

-f "<A class="fl">{.}</A>"

或

--follow "//a/[@class='nav']"

没有工作，你能给我一些帮助还是一些例子？

感谢。

Answer 1

xidel-0.9.5.4998.exe -s ^
                     "https://encrypted.google.com/search?q=xidel+follow+pagination&start=0" ^
                     -e "//a/extract(@href,'url\?q=(.+?)&',1)[.]" ^
                     -f "(//td/a/@href)[last()]" ^
                     -e "//a/extract(@href,'url\?q=(.+?)&',1)[.]"

或

xidel-0.9.5.4998.exe -s --user-agent "Xidel" ^
                     "https://encrypted.google.com/search?q=xidel+follow+pagination&start=0" ^
                     -e "//h3[@class='r']/a/extract(@href,'=(.+?)&',1)" ^
                     -f "//td[@class='b']/a/@href" ^
                     -e "//h3[@class='r']/a/extract(@href,'=(.+?)&',1)"

会做到这一点。

Answer 2

Reino是对的。但查询Google也可以这样做：

xidel -s "https://www.google.com" ^
      -f "form(//form,{'q':'xidel follow pagination','num':'25'})" ^
      -e "//a/extract(@href,'url\?q=(.+?)&',1)[.]"

xidel如何关注分页html并提取URL？

2 个答案: