在循环内向熊猫数据帧添加行

时间:2020-10-27 12:05:05

标签: python pandas beautifulsoup

有一个我从网站上刮下来的表,需要将其转换为数据框。它的html dom看起来像这样:

<tbody>
    <tr>
        <td>value1</td>
        <td>value2</td>
        <td>&nbsp;</td>
        ...
    <tr>
        <td>value1</td>
        <td>&nbsp</td>
        <td>&nbsp;</td>
        ...

我正在使用beautifulsoup抓取页面:

table=soup.find('tbody')
for row  in soup.find_all('tr'):
    value=row.find('td')
    print(value.text)

我想将此value.text附加到包含&nbsp;值(如NaN)的数据框的行上。

这是print(value.text)的示例输出(空格代表&nbsp;的值):

20Q4 FDLR WW Event Webinar 13 FixIssues - Didn't Attend
205
204
0
0.00%
1
0.49%
1
0.49%
179
87.75%
65
31.86%
3
1.47%
3
1.47%
3
 
4.62%
1
0.49%
1
0
0.00%
0
0.00%
0 

第一个包含表的标题。 我该怎么做?谢谢一群! :)

1 个答案:

答案 0 :(得分:0)

您可以简单地使用pd.read_html函数将html转换为数据框。这是您的操作方式:

import pandas as pd

table=soup.find('table') #Important thing to note: You have to provide the entire table to pd.read_html, not just the body of the table. Only then it would work.

dfs = pd.read_html(str(table))

df = dfs[0] #The output of pd.read_html is a list. In order to access your table (i.e the first and last element of the list), you can use dfs[0]