如果我有一个HTML Snippet,如何在python中获得如下所需的输出。
<td width="10" class="data1"><a class="datalink" href="m01_detail.asp?key=002396653&itemNumber=0">></a></td>
<td class="data1"><a class="datalink" href="m01_detail.asp?key=002396653&itemNumber=0">002396653</a></td>
<td class="data1">IMPORT EXPRESS RECYCLE</td>
<td class="data1">961879066</td>
<td class="data1">11/23/2016</td>
<td class="data1"></td> <!--SARA-->
<td class="data1" align="center">CN</td>
<td class="data1" align="center">PVG</td>
961879066 | CN
到目前为止 def reading():
with open("C:\\Users\\John\\Desktop\\test.txt") as f:
for lines in f.readlines():
line = lines.replace("\t","").strip()
print (line)
f.close()
reading()
谢谢,
答案 0 :(得分:0)
您可以尝试以下代码来获取所需的输出:
import lxml.html
html = lxml.html.fromstring("""<td width="10" class="data1"><a class="datalink" href="m01_detail.asp?key=002396653&itemNumber=0">></a></td>
<td class="data1"><a class="datalink" href="m01_detail.asp?key=002396653&itemNumber=0">002396653</a></td>
<td class="data1">IMPORT EXPRESS RECYCLE</td>
<td class="data1">961879066</td>
<td class="data1">11/23/2016</td>
<td class="data1"></td> <!--SARA-->
<td class="data1" align="center">CN</td>
<td class="data1" align="center">PVG</td>""")
output = html.xpath('concat(//td[4], "|", //td[7])')
print(output) # '961879066|CN'
将原始HTML
代码传递给html
变量