有一个我从网站上刮下来的表,需要将其转换为数据框。它的html dom看起来像这样:
<tbody>
<tr>
<td>value1</td>
<td>value2</td>
<td> </td>
...
<tr>
<td>value1</td>
<td> </td>
<td> </td>
...
我正在使用beautifulsoup抓取页面:
table=soup.find('tbody')
for row in soup.find_all('tr'):
value=row.find('td')
print(value.text)
我想将此value.text
附加到包含
值(如NaN)的数据框的行上。
这是print(value.text)
的示例输出(空格代表
的值):
20Q4 FDLR WW Event Webinar 13 FixIssues - Didn't Attend
205
204
0
0.00%
1
0.49%
1
0.49%
179
87.75%
65
31.86%
3
1.47%
3
1.47%
3
4.62%
1
0.49%
1
0
0.00%
0
0.00%
0
第一个包含表的标题。 我该怎么做?谢谢一群! :)
答案 0 :(得分:0)
您可以简单地使用pd.read_html
函数将html转换为数据框。这是您的操作方式:
import pandas as pd
table=soup.find('table') #Important thing to note: You have to provide the entire table to pd.read_html, not just the body of the table. Only then it would work.
dfs = pd.read_html(str(table))
df = dfs[0] #The output of pd.read_html is a list. In order to access your table (i.e the first and last element of the list), you can use dfs[0]