如何使用beatifulsoup python提取表列和行

时间:2017-04-25 06:24:40

标签: python beautifulsoup

Beautifulsoup noob在这里。仅仅是为了练习,我试图在这里提取this page的包和版本列。我尝试使用table = soup.find('table', attrs={'class': 'listing sortable'})获取表格内容,但我并没有真正得到任何有价值的数据..我很丢失。

这是截图 enter image description here

3 个答案:

答案 0 :(得分:2)

import requests
import bs4

url = 'https://launchpad.net/~openshot.developers/+archive/ubuntu/ppa'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
tbody = soup.find_all(id='packages_list')[0].tbody

for tr in tbody.find_all('tr'):
    package = tr.find_all('td')[0].contents[2].strip()
    version = tr.find_all('td')[1].contents[0].strip()
    print('{0} - {1}'.format(package, version))

答案 1 :(得分:1)

table = soup.find("table", id="packages_list")
row_data = []
for row in table.find_all("tr"):
    cols = row.find_all("td")
    cols = [ele.text.strip() for ele in cols]
    row_data.append(cols)

我不确定你现在得到了什么结果,但试试看!

答案 2 :(得分:1)

您可以迭代tr代码并提取包和版本:

table = soup.find('table', attrs={'class': 'listing sortable'})
package = '' ; version = ''
for i in table.select('tr'):
    data = i.select('td')
    if data:
        package = data[0].text.strip()
        version = ' '.join(data[1].text.strip().split())
        print('{} : {} '.format(package,version))

#output
libopenshot : 0.1.4+0+588+107+201703310338+daily~ubuntu17.04.1 
libopenshot : 0.1.4+0+588+107+201703310338+daily~ubuntu15.04.1 
libopenshot : 0.1.4+0+588+107+201703310337+daily~ubuntu16.10.1 
libopenshot : 0.1.4+0+588+107+201703310337+daily~ubuntu16.04.1 
...
...