这是我的html文件
<tr>
<td>1</td>
<td style="font-weight: bold;"><a href="#" onclick="javascript:TollPlazaPopup(272);"> Kherki Daula </a></td>
<td style="font-weight: bold;">60 <a onclick="return popitup(" https:="" www.google.co.in="" maps="" @28.395604,76.98176,17.52z="" data="!5m1!1e1?hl=en')'" href="https://www.google.co.in/maps/@28.395604,76.98176,17.52z/data=!3m1!1e3!5m1!1e1?hl=en" target="_Blank"> (Live Traffic)</a> - <a href="#" title="Click here to get estimated travel time." id="0-232X" onclick="javascript:TollPlazaTrafficTime(272,this);">ET</a>
</td>
</tr>
<tr>
<td>2</td>
<td style="font-weight: bold;"><a href="#" onclick="javascript:TollPlazaPopup(213);"> Shahjahanpur </a></td>
<td style="font-weight: bold;">125 <a onclick="return popitup(" https:="" www.google.co.in="" maps="" @27.99978,76.430522,17.52z="" data="!5m1!1e1?hl=en')'" href="https://www.google.co.in/maps/@27.99978,76.430522,17.52z/data=!3m1!1e3!5m1!1e1?hl=en" target="_Blank"> (Live Traffic)</a> - <a href="#" title="Click here to get estimated travel time." id="1-179X" onclick="javascript:TollPlazaTrafficTime(213,this);">ET</a>
</td>
</tr>
现在我正在抓,所以结果就像
Sr No. Toll Plaza Car/Jeep/Van(Rs.)
1 Kherki Daula 60 (Live Traffic) - ET
2 Shahjahanpur 125 (Live Traffic) - ET
Total Charges(Rs.) 90
我想从行
跳过文本(实时流量 - ET)我的python代码是
tbody = soup('table' ,{"class":"tollinfotbl"})[0].find_all('tr')[3:]
for row in tbody:
cols = row.findChildren(recursive=False)
cols = [ele.text.contents[0] for ele in cols]
if cols:
sno = str(cols[0])
Toll_plaza = str(cols[1])
cost = str(cols[2])
query = "INSERT INTO tryroute (sno,Toll_plaza, cost) VALUES (%s, %s, %s);"
当我使用.contents[0]
时,我收到错误cols = [ele.text.content[0] for ele in cols]
AttributeError: 'str' object has no attribute 'content'
任何帮助将不胜感激。
答案 0 :(得分:2)
您收到此错误是因为您尝试在str对象上使用“contents”,即ele.text
ele.text # returns a string object (which in your case contains the whole text in that particular tag)
要获取标签的内容,您必须这样做
ele.contents # inside your list comprehension, this will return a list of all the children of that particular tag
答案 1 :(得分:1)
您可以使用re
从原始数据中提取数据。您不需要获得content[]
,因为您很容易出错,因为您明确提供了索引而且不灵活。
在复制下面的代码之前,在顶部添加import re
。
for row in tbody:
cols = row.findChildren(recursive=False)
cols = [ele.text for ele in cols]
if cols:
sno = str(cols[0])
Toll_plaza = str(cols[1])
cost_raw = str(cols[2])
compiled = re.compile('^(\d+)\s*\(', flags=re.IGNORECASE | re.DOTALL)
match = re.search(compiled, cost_raw)
if match:
cost = match.group(1)
query = "INSERT INTO tryroute (sno,Toll_plaza, cost) VALUES (%s, %s, %s);"
如果您需要澄清,请告诉我。