所以我试图学习如何使用Beautiful Soup从已经将关键字符串连接到一个块的网站中获取数据。我在网络上可以胜任谷歌,取得了一些成功。我在这一点上陷入困境,似乎我错过了一些基本知识,但是我被迫寻求帮助并且四处走动。我希望有人可以指出我正确的方向或给我一些反馈,因为我出错了:
首先::我给出了这个问题的简单版本,因为我不想发布一本书。如果有人愿意深入解决问题和我犯的实际错误,我会将我编写的脚本和实际代码附加在单独的文件中。我相信这是我用字符串和列表做的一个小概念错误,没有进一步的延迟
enter code here
<html>
<head>
<center>
<font face="arial" size="5">
<table border="0" cellpadding="0" cellspacing="0" width="100%" bgcolor="#000066">
<tr>
<td align="left" valign="top" bgcolor="#000066">
<a href="/"><img height="50" width="540" src="/leftbar-quote.gif" border="0" usemap="#leftbar10b39c7"></a>
<map name="leftbar10b39c7"><area href="/outside/multi.htm" coords="328,5,390,36" shape="rect">
<area href="/index.htm" coords="254,5,322,37" shape="rect">
<area href="#" coords="185,5,251,35" shape="rect" onclick="history.back(); return false;">
<area href="/cgi-bin/quoteForm.cgi?type=q&sEmail=&part=Engine&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&name=AutoPartex.net&int=-1&uIMS=&userSearch=exact&seqNum=600000000000000000456918622&ref=&userid=1000&email=&userClaim=&userLang=&userZip=&selleruserid=1000" coords="400,5,460,36" shape="rect">
<area href="/buyerfaq.htm" coords="470,5,530,36" shape="rect">
</map>
</td>
<td valign=top><div align="right"><img height="50" width="36" src="/result-rs.gif"></div></td>
</tr>
<tr>
<td COLSPAN=2><table WIDTH="100%"><tr>
<td width="10" valign="top"><img height="30" width="10" src="/trans4.gif"></td>
<td width="90%">
<b>
<div style='font-size:18pt; font-style: italic; color: white;'><b>Results sorted by <u>PRICE</u></b> <span class="small"><b>(Click on heading to re-sort)</b></span><br /></div><font color='#FFFFFF' face='Arial,Helvetica,Geneva,Swiss,SunSans-Regular' size='2'>Click back to modify your previous choice.<br>Most prices do not include extended warranties or shipping.<br>Not all displayed parts are interchangeable. Please verify with the recycler that the part fits your auto.<br /></font></b></td><td valign=bottom align=center><table bgcolor="#e4e4e4"width=350 cellpadding=3 border=1 cellspacing=0><tr><td align=center><form method="post" action="/cgi-bin/search.cgi" style="display: inline"><input type= hidden name=userDate value="2005"><input type= hidden name=userModel value="Ford Focus"><input type= hidden name=userLocation value="USA"><input type= hidden name=userPreference value="price"><input type= hidden name=userZip value=""><input type="hidden" name="userPage" value="1"><input type="hidden" name="userInterchange" value="None"><input type="hidden" name="userDate2" value="Ending Year"><input type="hidden" name="userSearch" value="int"><input type="hidden" NAME="userClaim" VALUE="">
<input type="hidden" NAME="userClaimer" VALUE="">
<input type="hidden" NAME="userLang" VALUE="">
<input type="hidden" NAME="userLat" VALUE="">
<input type="hidden" NAME="userLong" VALUE="">
<input type="hidden" NAME="userCSA" VALUE="">
<input type="hidden" NAME="userMCO" VALUE="">
<input type="hidden" NAME="userAdjuster" VALUE="">
<input type="hidden" NAME="userItem" VALUE="">
<input type="hidden" NAME="hpsDate" VALUE="">
<input type="hidden" NAME="hpsGroup" VALUE="">
<input type="hidden" NAME="reqId" VALUE="">
<input type="hidden" NAME="thirdMapType" VALUE="">
<input type="hidden" NAME="vendUrl" VALUE="">
<input type="hidden" NAME="iCN" VALUE="">
<input type='hidden' name='limitYears' value=''>
<input type='hidden' name='userIntSelect' value='711575'>
<input type='hidden' name='userVIN' value=''>
<input type='hidden' name='vinSearch' value='0'>
<input type='hidden' name='userVINModelID' value=''>
<input type="hidden" name="uID" value=""><input type="hidden" name="uPass" value=""><table bgcolor="#e4e4e4" width=350 cellpadding=3 border=1 cellspacing=0><tr><td colspan=2 align=center>2005 Ford Focus<br>Engine<br></td></tr><tr>
<td align=center>
<font style="font-size: 10pt">Non-Interchange search for year:<br></font>
<font style="font-size: 10pt"><b>2005</b><br><br></font>
<br>
<br><font style="font-size: 8pt"><a style="color:blue" href="/cgi-bin/search.cgi?userDate=2005&userModel=Ford%20Focus&userPart=Engine&origPart=&userPreference=price&userZip=&userLat=&userLong=&userVIN=&dbPart=300.1&userIntSelect=711575&userClaimer=&userClaim=&uID=&uPass=&userLocation=USA&userSearch=int">Click Here</a> to see All Interchange Choices </font>
</td>
</table></table></form>
</td></tr></table></td></tr></table><table width="100%" border="1" cellspacing="0" cellpadding="4">
<tr align=center>
<td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=year&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Year</a><br>Part<br>Model</td>
<td>Description</td>
<td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=miles&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Miles</a></td>
<td><a href='/cgi-bin/search.cgi?userSearch=exact&userPID=1000&userLocation=USA&userIMS=&userInterchange=%5B%7C%7Br&userSide=&userDate=2005&userDate2=2005&dbModel=27.20&userModel=Ford%20Focus&dbPart=300.1&userPart=Engine&sessionID=600000000000000000456918622&userPreference=grade&userIntSelect=711575&userUID=0&userBroker=&userPage=1&iKey='>Part <br> Grade</a></td> <td>Stock#</td>
<td>US<br>Price</td>
<td>Dealer Info</td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td><a href=""><img width="100" hspace="3" align="middle" onclick="return popupImg('seller=2013&partGUID=2013-1-282435&vehicleGUID=2013-1-V18432&display=2005%20Ford%20Focus%20Engine%20Assembly-Stock%23%2010286')" src="http://wsimgoh.autopartex.net/2013/2015/10286/2013_18432_05_thumb.jpg"></img></a>ZX4,2.0,EFI,FATO,FWDRUNSGREAT</td><td align=right> </td><td align=center> </td><td>10286</td><td align=center>$350550</td><td><A HREF="http://www.LaPointAuto.com" target="_top">LaPoint Discount MIDW</A> USA-OH(Holland) <A HREF="/cgi-bin/quoteForm.cgi?type=g&sEmail=shawn@LaPointAuto.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=10286&price=350550&desc=ZX4%2C2.0%2CEFI%2CFATO%2CFWDRUNSGREAT&name=LaPoint%20Discount%20MIDW&url=http://www.LaPointAuto.com&int=-1&broker=0&recycler=0&selleruserid=2013&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 419-865-2329 / 800-845-0270 <A HREF="/cgi-bin/quoteForm.cgi?type=i&sEmail=shawn@LaPointAuto.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=10286&price=350550&desc=ZX4%2C2.0%2CEFI%2CFATO%2CFWDRUNSGREAT&name=LaPoint%20Discount%20MIDW&url=http://www.LaPointAuto.com&int=-1&broker=0&recycler=0&selleruserid=2013&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=2013&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=350550&pst=10286&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td>TESTED,2.3L,5MT,08/04,FWD,+CORE</td><td align=right> </td><td align=center> </td><td>E94764</td><td align=center>$1500</td><td><A HREF="http://www.ParadiseAutoParts.com" target="_top">Paradise Auto Parts-ELITE</A> USA-MD(Elkton) <A HREF="/cgi-bin/quoteForm.cgi?type=g&sEmail=mdriver@complete-recycle.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=E94764&price=1500&desc=TESTED%2C2.3L%2C5MT%2C08%2F04%2CFWD%2C%2BCORE&name=Paradise%20Auto%20Parts-ELITE&url=http://www.ParadiseAutoParts.com&int=-1&broker=0&recycler=0&selleruserid=2843&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 888-811-5051/410-620-5051 <A HREF="/cgi-bin/quoteForm.cgi?type=i&sEmail=mdriver@complete-recycle.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=E94764&price=1500&desc=TESTED%2C2.3L%2C5MT%2C08%2F04%2CFWD%2C%2BCORE&name=Paradise%20Auto%20Parts-ELITE&url=http://www.ParadiseAutoParts.com&int=-1&broker=0&recycler=0&selleruserid=2843&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=2843&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=1500&pst=E94764&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr><tr><td>2005<br>Engine Assembly<br>Ford Focus</td><td>175-175</td><td align=right>38,916</td><td align=center>A</td><td>FC6555</td><td align=center>$1250</td><td><A HREF="http://www.DonsSportcar.com" target="_top">Don's Sportcar</A> USA-CO(Pueblo) <A HREF="/cgi-bin/quoteForm.cgi?type=g&sEmail=parts@DonsSportcar.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=FC6555&price=1250&desc=175-175&name=Don's%20Sportcar&url=http://www.DonsSportcar.com&int=-1&broker=0&recycler=0&selleruserid=3776&miles=38.916&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</A> 800-332-3649 <A HREF="/cgi-bin/quoteForm.cgi?type=i&sEmail=parts@DonsSportcar.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=FC6555&price=1250&desc=175-175&name=Don's%20Sportcar&url=http://www.DonsSportcar.com&int=-1&broker=0&recycler=0&selleruserid=3776&miles=38.916&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</A><br><a target=_blank href="http://appcgi.autopartex.net/cgi-bin/applet.cgi?sid=3776&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=1250&pst=FC6555&pgr=A&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick='window.open(this.href,this.target,getPrm()); return false'><img src='/images/LiveChat_space.gif' border=0></a></b></td></tr>
</table>
</div>
</body> </html>
这是html文本和结构。这就是我在方法方面实际需要帮助的地方:
由于没有css装饰器,我无法找到使用xpath或类似selenium的传统示例。然而,我可能是错的,noob
我需要将单元格中的文本分隔成单独的字符串。
使用BeautifulSoup我尝试使用几种方法来获取文本
在尝试这样的事情后,我收到了这个错误:
从bs4 import BeautifulSoup
汤= BeautifulSoup(打开(&#34; ./ test.html&#34;),&#34; lxml&#34;)
trs = soup.find_all(&#39; tr&#39;)
for tr in trs:
tds = tr.find_all("td")
try:
result = str(tds[0].get_text())
except:
adjust = ' '
continue
result = result.split(" ")
result = str.replace('2005Engine', "2005Engine", "2005 ") + str.replace('AssemblyFord', "AssemblyFord", "Engine Assembly ") + str.repl$
strresult = ''.join(result)
trs = soup.find_all('tr')
for tr in trs:
tds = tr.find_all("td")
tds[0] = strresult
tds.get_text()
print(tds)
错误消息:
追踪(最近一次通话): 文件&#34; carpartbs5.find.td.py&#34;,第33行,in tds.get_text()
文件&#34; /usr/local/lib/python2.7/dist-packages/bs4/element.py",第1807行, getattr
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError:ResultSet对象没有属性&#39; get_text&#39;。您可能正在处理像单个项目一样的项目列表。当你打算调用find()时,你调用了find_all()吗?
这里是FLIP-SCOUP ::
当我打印tds时,它会使用任何数组替换第一个td,但是,每当我尝试使用BeautifulSoup中的get_text()方法返回文本时,它会抛出该错误。该错误似乎表明我在一个不可能的事情上调用方法时遇到问题。
所以我对列表和字符串并不十分清楚。我尝试将我的列表转换为实际的字符串,但它不起作用。我想是因为我使用了一个列表,这就是它无法获取文本的原因。如果是这样,使用BeautifulSoup有更好的方法来实现以下目标:
希望这有帮助,我没有足够的积分来发布图片或上传文件。最后一个文本是我的程序吐出来的,如果我不在Tds变量上尝试和调用一个美丽的方法。先谢谢!
我的代码
`来自bs4 import BeautifulSoup
汤= BeautifulSoup(打开(&#34; ./ test.html&#34;),&#34; lxml&#34;)
trs = soup.find_all(&#39; tr&#39;)
for tr in trs:
tds = tr.find_all("td")
try:
result = str(tds[0].get_text())
except:
adjust = ' '
continue
result = result.split(" ")
result = str.replace('2005Engine', "2005Engine", "2005 ") + str.replace('AssemblyFord', "AssemblyFord", "Engine Assembly ") + str.repl$
strresult = ''.join(result)
trs = soup.find_all('tr')
for tr in trs:
tds = tr.find_all("td")
tds[0] = strresult
print(tds)'
返回的内容 - 示例
['2005 Engine Assembly Ford Focus ', <td>139K</td>, <td align="right">\xa0</td>, <td align="center">\xa0</td>, <td>0232</td>, <td align="center">$800</td>, <td><a href="http://someurl.com" target="_top">Chads Part </a> USA-FL(Jacksonville) <a href="/cgi-bin/quoteForm.cgi?type=g&sEmail=chadsparts@someplace.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=0232&price=800&desc=139K&name=Chads%20Parts&url=http://someurl.com&int=-1&broker=0&recycler=0&selleruserid=3566&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Quote</a> 1-510-569-4845 <a href="/cgi-bin/quoteForm.cgi?type=i&sEmail=chadsparts@someplace.com&email=&part=Engine%20Assembly&dbPart=300.1&dbSubPart=&model=Ford%20Focus&dbModel=27.20&year=2005&stockNum=0232&price=800&desc=139K&name=Chads%20Parts=rs&url=http://someurl.com&int=-1&broker=0&=0&selleruserid=3566&miles=-1&condition=-1&userid=1000&uIMS=&seqNum=600000000000000000456918622&userClaim=&userLang=">Request_Insurance_Quote</a><br/><a href="http://someurl.com/cgi-bin/applet.cgi?sid=3566&brf=&bds=&bsr=price&pin=&pyr=2005&pmd=Ford%20Focus&ppt=Engine%20Assembly&ppr=800&pst=0232&pgr=&bty=WEB&bem=&bzp=&ses=600000000000000000456918622" onclick="window.open(this.href,this.target,getPrm()); return false" target="_blank"><img border="0" src="/images/LiveChat_space.gif"/></a></td>]
只是要加强::
我只想将这些元素中的文本用逗号分隔成一个字符串,我可以在准备编写csv文件时再次使用它。
年,部分,汽车品牌,汽车模型,描述,里程,零件等级,库存号,价格,经销商名称,国家,州,城市,电话
答案 0 :(得分:0)
如果你想要3次&#34; 2005发动机组装福特福克斯&#34; (就像在你的html示例中一样),你可以这样做:
table = soup.findAll('table')[-1]
tr = table.findAll('tr')[1:]
它将是数组。你可以在之后循环遍历行。
td
标签。我只会为第一行做这件事。 td = tr[0].td
<td>2005<br/>Engine Assembly<br/>Ford Focus</td>
不幸的是,我不知道如何处理这个字符串。 例如,您可以使用此方法:
td = tr[0].td.children
您将获得包含所有单词和标签的数组,并根据需要进行处理。