我正在尝试提取标记下的href链接。
请参阅附件。我想保存标签“PDF”下的所有链接。
http://tinypic.com/r/2n9erdj/8
抱歉,目前我还不允许更新图片。
具体而言,href细节显示为arnumber = 60940cc,如红色圆圈所示。
有人可以建议如何实现这一点。我打算使用usercript或bash命令。
与单个pdf相关的html元素详细信息如下所示。
<a aria-label="Download or View the PDF: IEEE Transactions on Power Electronics publication information" href="/stampPDF/getPDF.jsp?tp=&arnumber=6094072"><img class="button" src="http://staticieeexplore.ieee.org/assets/img/iconPdf.png" alt="PDF file icon" title="Download or View the PDF">PDF</a>
我正在测试的网页是
http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6088512
目标是过滤名为“pdf”及其网址的内容。
答案 0 :(得分:0)
试试这个:如果你不想要&#34; http://ieeexplorer.ieee.org/&#34;请调整sed部分。在戳字等之前
wget http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6088512 -O file.html
grep -o "href.*stamp.*\"><" file.html |sed 's#"#"http://ieeexplorer.ieee.org#;s#><##'
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6094070"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6094072"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6094110"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6088513"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5680978"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5985544"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5723758"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5716681"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5936741"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5934597"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5734858"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5756244"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5759746"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958614"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5999721"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6021380"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5961632"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5951783"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5983448"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5934423"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5957306"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5898425"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5959991"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5776690"
href="http://ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5953525"
OR
$ grep -o "href.*stamp.*\"><" file.html |sed 's#href="#ieeexplorer.ieee.org#;s#"><##'
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6094070
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6094072
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6094110
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6088513
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5680978
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5985544
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5723758
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5716681
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5936741
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5934597
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5734858
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5756244
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5759746
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5958614
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5999721
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=6021380
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5961632
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5951783
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5983448
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5934423
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5957306
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5898425
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5959991
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5776690
ieeexplorer.ieee.org/stamp/stamp.jsp?tp=&arnumber=5953525