使用Python-BeautifulSoup刮擦表格数据

时间:2017-08-17 09:24:01

标签: python beautifulsoup bs4

无法弄清楚如何刮取第一个表数据而不是两者。

<tr>
<td>WheelDust
</td>
<td>A large puff of barely visible brown dust
</td></tr>

我只想要WheelDust但是我得到了WheelDust和一大堆几乎看不见的棕色尘埃

import requests
from bs4 import BeautifulSoup


r = requests.get("https://wiki.garrysmod.com/page/Effects")

soup = BeautifulSoup(r.content, "html.parser")

for td in soup.findAll("table"):
    #--print(td)
    for a in td.findAll("tr"):
        print(a.text)

2 个答案:

答案 0 :(得分:1)

我仍然不确定你问的是什么,但我相信你说你想要访问的只有第一个,对吗?如果是这样的话,这不起作用吗?我尝试过,但它说我无法访问该网站。

import requests
from bs4 import BeautifulSoup


r = requests.get("https://wiki.garrysmod.com/page/Effects")

soup = BeautifulSoup(r.content, "html.parser")

for td in soup.findAll("table"):
    #--print(td)
    for a in td.findAll("tr"):
        print(a.find('td'))

答案 1 :(得分:1)

试试这个。它将为您提供该表中的所有数据。

import requests ; from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://wiki.garrysmod.com/page/Effects").text, "html.parser")

table = soup.findAll('table', attrs={'class':'wikitable'})[0] # Changing the index number will give you whichever table you like
list_of_rows = [[t_data.text for t_data in item.findAll('td')]
                for item in table.findAll('tr')]

for data in list_of_rows:
    print(data)