我正在使用python 3.6和pycharm 2016.2并试图抓取一个网站。
在“보험사고이력정보:내차피해”(包括第五张表)的类别中,如果其中一个“p tag”的内容中包含“ - 사고일자”,我试图抓取数据。
以下是我的代码。它一直没有返回。
请帮忙。
from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urlparse
import re
popup_insurance = "http://www.bobaedream.co.kr/mycar/popup/mycarChart_B.php?car_number=35%EB%91%908475&tbl=cyber&cno=651451"
res = urllib.request.urlopen(popup_insurance)
html = res.read()
soup_insurance = BeautifulSoup(html, 'html.parser')
insurance_content_table = soup_insurance.find_all('table')
elem = soup_insurance.find("p", text="보험사고이력 정보 : 내차 피해")
while elem.string != "보험사고이력 정보 : 타차 가해":
if "사고일자" in elem.next_sibling:
print(elem.next_sibling)
elem = elem.next_sibling
if elem is None:
break
答案 0 :(得分:0)
你应该循环遍历elem.next_sibling
,NavigableString
' s有时会很奇怪:
from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urlparse
import re
popup_insurance = "http://www.bobaedream.co.kr/mycar/popup/mycarChart_B.php?car_number=35%EB%91%908475&tbl=cyber&cno=651451"
res = urllib.request.urlopen(popup_insurance)
html = res.read()
soup_insurance = BeautifulSoup(html, 'html.parser')
insurance_content_table = soup_insurance.find_all('table')
elem = soup_insurance.find("p", text="보험사고이력 정보 : 내차 피해")
while elem.string != "보험사고이력 정보 : 타차 가해":
for string in elem.next_sibling:
if "사고일자" in string:
print(elem.next_sibling.string.strip())
elem = elem.next_sibling
if elem is None:
break
我假设(因为你没有提供预期的输出)你想要事故日期/维修费用。
这远远不够完美甚至优雅,我几乎可以肯定只需要for循环即可完成。