如何在python中给出html代码中的所有td值

时间:2018-02-21 06:53:02

标签: python html web-scraping

我有以下输入文件。

<td align="right">
 <img alt="inflation rates india" src="http://www.inflation.eu/images/country_icons/round_icons_36/india.jpg">
 </img></td>,
 <td align="right" style="width:20%;">inflation</td>,
 <td align="right" style="width:20%;">inflation </td>,
 <td align="right">-0.69 %</td>,
 <td align="right">4.00 % </td>,
 <td align="right">0.35 %</td>,
 <td align="right">3.97 % </td>,
 <td align="right">0.70 %</td>,
 <td align="right">3.24 % </td>,
 <td align="right">0.00 %</td>,
 <td align="right">2.89 % </td>,
 <td align="right">0.00 %</td>,
 <td align="right">2.52 % </td>,
 <td align="right">1.79 %</td>,
 <td align="right">1.79 % </td>,
 <td align="right">0.72 %</td>,
 <td align="right">1.08 % </td>,
 <td align="right">0.36 %</td>,
 <td align="right">1.09 % </td>,
 <td align="right">0.73 %</td>,
 <td align="right">2.21 % </td>,
 <td align="right">0.36 %</td>,
 <td align="right">2.61 % </td>,
 <td align="right">0.00 %</td>,
 <td align="right">2.62 % </td>,
 <td align="right">-0.36 %</td>,
 <td align="right">1.86 % </td>,
 <td align="right">
 <a class="footer" href="http://www.inflation.eu/" target="blank">inflation.eu</a> is an initiative of Triami Media BV in cooperation with <a class="footer" href="http://www.homefinance.nl/" target="blank">HomeFinance</a> - © 2010 - 2018 Copyright 
 </td>

我的Python代码是

table = page_soup.findAll('td',{"align":"right"})

list_of_rows = []
for row in table.find_all('tr'):
    list_of_cells = []
    for cell in table.findAll('td'):
        list_of_cells.append(cell.text)
    list_of_rows.append(list_of_cells)

我希望所有td值后跟百分比。以前我尝试使用find_all函数,但是它给出了错误。

AttributeError: 'ResultSet' object has no attribute 'find_all'

3 个答案:

答案 0 :(得分:1)

from bs4 import BeautifulSoup


d = """<table><tr><td align="right">
 <img alt="inflation rates india" src="http://www.inflation.eu/images/country_icons/round_icons_36/india.jpg">
 </img></td>,
 <td align="right" style="width:20%;">inflation</td>,
 <td align="right" style="width:20%;">inflation </td>,
 <td align="right">-0.69 %</td>,
 <td align="right">4.00 % </td>,
 <td align="right">0.35 %</td>,
 <td align="right">3.97 % </td>,
 <td align="right">0.70 %</td>,
 <td align="right">3.24 % </td>,
 <td align="right">0.00 %</td>,
 <td align="right">2.89 % </td>,
 <td align="right">0.00 %</td>,
 <td align="right">2.52 % </td>,
 <td align="right">1.79 %</td>,
 <td align="right">1.79 % </td>,
 <td align="right">0.72 %</td>,
 <td align="right">1.08 % </td>,
 <td align="right">0.36 %</td>,
 <td align="right">1.09 % </td>,
 <td align="right">0.73 %</td>,
 <td align="right">2.21 % </td>,
 <td align="right">0.36 %</td>,
 <td align="right">2.61 % </td>,
 <td align="right">0.00 %</td>,
 <td align="right">2.62 % </td>,
 <td align="right">-0.36 %</td>,
 <td align="right">1.86 % </td>,
 <td align="right">
 <a class="footer" href="http://www.inflation.eu/" target="blank">inflation.eu</a> is an initiative of Triami Media BV in cooperation with <a class="footer" href="http://www.homefinance.nl/" target="blank">HomeFinance</a> - © 2010 - 2018 Copyright 
 </td></tr></table>"""

soup = BeautifulSoup(d, "html.parser")
for tr in soup.findAll("table"):
     for td in tr.find_all("td"):
         if not td.attrs.get('style'):
             print td.text

<强>输出

-0.69 %
4.00 % 
0.35 %
3.97 % 
0.70 %
3.24 % 
0.00 %
2.89 % 
0.00 %
2.52 % 
1.79 %
1.79 % 
0.72 %
1.08 % 
0.36 %
1.09 % 
0.73 %
2.21 % 
0.36 %
2.61 % 
0.00 %
2.62 % 
-0.36 %
1.86 % 

答案 1 :(得分:0)

如果您只想在文档中使用td,而不考虑该表,则可以执行以下操作:

list_of_cells = []
for cell in page_soup.find_all('td'):
    text = cell.text.strip()
    if text.endswith('%'):
        list_of_cells.append(text)

至于关于table的错误消息,ResultSet就像一个列表 - 您必须使用其中的单个项目,例如通过for循环。

list_of_cells = []
for tab in table:
    for cell in tab.find_all('td'):
        text = cell.text.strip()
        if text.endswith('%'):
            list_of_cells.append(text)

请注意,这可能会跳过顶级标记。

答案 2 :(得分:0)

这是您可以尝试获得所需输出的另一种方式:

soup = BeautifulSoup(content,"lxml")
items = '\n'.join([item.text for item in soup.find_all("td") if "%" in item.text])
print(items)