我有以下输入文件。
<td align="right">
<img alt="inflation rates india" src="http://www.inflation.eu/images/country_icons/round_icons_36/india.jpg">
</img></td>,
<td align="right" style="width:20%;">inflation</td>,
<td align="right" style="width:20%;">inflation </td>,
<td align="right">-0.69 %</td>,
<td align="right">4.00 % </td>,
<td align="right">0.35 %</td>,
<td align="right">3.97 % </td>,
<td align="right">0.70 %</td>,
<td align="right">3.24 % </td>,
<td align="right">0.00 %</td>,
<td align="right">2.89 % </td>,
<td align="right">0.00 %</td>,
<td align="right">2.52 % </td>,
<td align="right">1.79 %</td>,
<td align="right">1.79 % </td>,
<td align="right">0.72 %</td>,
<td align="right">1.08 % </td>,
<td align="right">0.36 %</td>,
<td align="right">1.09 % </td>,
<td align="right">0.73 %</td>,
<td align="right">2.21 % </td>,
<td align="right">0.36 %</td>,
<td align="right">2.61 % </td>,
<td align="right">0.00 %</td>,
<td align="right">2.62 % </td>,
<td align="right">-0.36 %</td>,
<td align="right">1.86 % </td>,
<td align="right">
<a class="footer" href="http://www.inflation.eu/" target="blank">inflation.eu</a> is an initiative of Triami Media BV in cooperation with <a class="footer" href="http://www.homefinance.nl/" target="blank">HomeFinance</a> - © 2010 - 2018 Copyright
</td>
我的Python代码是
table = page_soup.findAll('td',{"align":"right"})
list_of_rows = []
for row in table.find_all('tr'):
list_of_cells = []
for cell in table.findAll('td'):
list_of_cells.append(cell.text)
list_of_rows.append(list_of_cells)
我希望所有td值后跟百分比。以前我尝试使用find_all函数,但是它给出了错误。
AttributeError: 'ResultSet' object has no attribute 'find_all'
答案 0 :(得分:1)
from bs4 import BeautifulSoup
d = """<table><tr><td align="right">
<img alt="inflation rates india" src="http://www.inflation.eu/images/country_icons/round_icons_36/india.jpg">
</img></td>,
<td align="right" style="width:20%;">inflation</td>,
<td align="right" style="width:20%;">inflation </td>,
<td align="right">-0.69 %</td>,
<td align="right">4.00 % </td>,
<td align="right">0.35 %</td>,
<td align="right">3.97 % </td>,
<td align="right">0.70 %</td>,
<td align="right">3.24 % </td>,
<td align="right">0.00 %</td>,
<td align="right">2.89 % </td>,
<td align="right">0.00 %</td>,
<td align="right">2.52 % </td>,
<td align="right">1.79 %</td>,
<td align="right">1.79 % </td>,
<td align="right">0.72 %</td>,
<td align="right">1.08 % </td>,
<td align="right">0.36 %</td>,
<td align="right">1.09 % </td>,
<td align="right">0.73 %</td>,
<td align="right">2.21 % </td>,
<td align="right">0.36 %</td>,
<td align="right">2.61 % </td>,
<td align="right">0.00 %</td>,
<td align="right">2.62 % </td>,
<td align="right">-0.36 %</td>,
<td align="right">1.86 % </td>,
<td align="right">
<a class="footer" href="http://www.inflation.eu/" target="blank">inflation.eu</a> is an initiative of Triami Media BV in cooperation with <a class="footer" href="http://www.homefinance.nl/" target="blank">HomeFinance</a> - © 2010 - 2018 Copyright
</td></tr></table>"""
soup = BeautifulSoup(d, "html.parser")
for tr in soup.findAll("table"):
for td in tr.find_all("td"):
if not td.attrs.get('style'):
print td.text
<强>输出强>:
-0.69 %
4.00 %
0.35 %
3.97 %
0.70 %
3.24 %
0.00 %
2.89 %
0.00 %
2.52 %
1.79 %
1.79 %
0.72 %
1.08 %
0.36 %
1.09 %
0.73 %
2.21 %
0.36 %
2.61 %
0.00 %
2.62 %
-0.36 %
1.86 %
答案 1 :(得分:0)
如果您只想在文档中使用td
,而不考虑该表,则可以执行以下操作:
list_of_cells = []
for cell in page_soup.find_all('td'):
text = cell.text.strip()
if text.endswith('%'):
list_of_cells.append(text)
至于关于table
的错误消息,ResultSet
就像一个列表 - 您必须使用其中的单个项目,例如通过for循环。
list_of_cells = []
for tab in table:
for cell in tab.find_all('td'):
text = cell.text.strip()
if text.endswith('%'):
list_of_cells.append(text)
请注意,这可能会跳过顶级标记。
答案 2 :(得分:0)
这是您可以尝试获得所需输出的另一种方式:
soup = BeautifulSoup(content,"lxml")
items = '\n'.join([item.text for item in soup.find_all("td") if "%" in item.text])
print(items)