我无法从以下复选框和一个地址字段中抓取数据
<table width=900 cellspacing=0 border=0 cellpadding=5 style='border-top:1px solid silver;border-left:1px solid silver;border-right:1px solid silver;'>
<tr id='row618534' >
<td style='border-bottom:1px solid silver;background:#ffffff;' padding-bottom :10px;>
<div id='r618534'>
<div style='color:red; font-weight:bold; '>
Warning... Duplicate Found!
</div>
<table width=100% border=0 cellpadding=2 cellspacing=0 style='margin-top:15px;border:4px #70797a; border-radius: 5px;'>
<tr>
<td style='background:lightgreen; width:55px;' valign=top>
<img src='../images/checkwhite.png' style='width:30px;'>
</td>
<td style='background:lightgreen;'>
<input checked type=checkbox name=jobs[] value='618534'>
<strong>2 Colonial Dr Newport Beach CA 92660</strong>
<td style='background:lightgreen;' align=right><input type='hidden' id='miles618534'><span style='margin-left:0px;' onclick="sub618534()" class='button_input'> Process this order</span></span></td>
<tr>
<td>Your Input</td>
<td style='padding-left:28px;'>2 COLONIAL DR NEWPORT BEACH CA 92660</td>
<td align=right><a href='customer_multi_jobs_review.php?del=1&djob=NjE4NTM0' style='color:blue;'><b><img title='Remove / Delete Order' src='../images/deletorder.png' style='width:30px;'></b></a></td>
</tr>
</table>
<div style=' margin-left:40px;'>
Exterior BPO - Light Photo Set (3 photos*) <br>$9.00 We found a rep 4.6 miles from order. <span style='color:silver'> Resolution 640x480 GPS REQUIRED: Yes <span style='margin-left:10px;'>Datestamped </span> </span><br clear=all>
<div style=float:left;'>
来自input checked type=checkbox name=jobs[] value='618534'>
的Id 文字“您的输入”之后的地址
我尝试了多种方法,但仅获得了ID,但无法捕获地址详细信息。 请在下面找到我的代码
for input_node in response.xpath('//input[@name="jobs[]"]'):
id = input_node.xpath(./@value).extract_first()
address = input_node.xpath('./following-sibling::table[1]//td[.="Your Input"]/following-sibling::td[1]/text()').extract_first()
答案 0 :(得分:0)
尝试以下方法。它会为您获取所需的必填字段。
from scrapy import Selector
htmldoc = """
<table width=900 cellspacing=0 border=0 cellpadding=5 style='border-top:1px solid silver;border-left:1px solid silver;border-right:1px solid silver;'><tr id='row618534' ><td style='border-bottom:1px solid silver;background:#ffffff;' padding-bottom :10px;><div id='r618534'><div style='color:red; font-weight:bold; '>Warning... Duplicate Found!</div> <table width=100% border=0 cellpadding=2 cellspacing=0 style='margin-top:15px;border:4px #70797a; border-radius: 5px;'><tr><td style='background:lightgreen; width:55px;' valign=top><img src='../images/checkwhite.png' style='width:30px;'></td><td style='background:lightgreen;'><input checked type=checkbox name=jobs[] value='618534'> <strong>2 Colonial Dr Newport Beach CA 92660</strong> <td style='background:lightgreen;' align=right><input type='hidden' id='miles618534'><span style='margin-left:0px;' onclick="sub618534()" class='button_input'> Process this order</span></span></td><tr><td>Your Input</td><td style='padding-left:28px;'>2 COLONIAL DR NEWPORT BEACH CA 92660</td><td align=right><a href='customer_multi_jobs_review.php?del=1&djob=NjE4NTM0' style='color:blue;'><b><img title='Remove / Delete Order' src='../images/deletorder.png' style='width:30px;'></b></a></td></tr></table><div style=' margin-left:40px;'> Exterior BPO - Light Photo Set (3 photos*) <br>$9.00 We found a rep 4.6 miles from order. <span style='color:silver'> Resolution 640x480 GPS REQUIRED: Yes <span style='margin-left:10px;'>Datestamped </span> </span><br clear=all><div style=float:left;'>
"""
sel = Selector(text=htmldoc)
for input_node in sel.xpath('//tr//input[@name="jobs[]"]'):
id_num = input_node.xpath('./@value').extract_first()
address = input_node.xpath('.//following::td[contains(text(),"Your Input")]//following-sibling::td//text()').extract_first().strip()
print(f'{id_num}\n{address}')
它产生的输出:
618534
2 COLONIAL DR NEWPORT BEACH CA 92660