Beautifulsoup noob在这里。仅仅是为了练习,我试图在这里提取this page的包和版本列。我尝试使用table = soup.find('table', attrs={'class': 'listing sortable'})
获取表格内容,但我并没有真正得到任何有价值的数据..我很丢失。
答案 0 :(得分:2)
import requests
import bs4
url = 'https://launchpad.net/~openshot.developers/+archive/ubuntu/ppa'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
tbody = soup.find_all(id='packages_list')[0].tbody
for tr in tbody.find_all('tr'):
package = tr.find_all('td')[0].contents[2].strip()
version = tr.find_all('td')[1].contents[0].strip()
print('{0} - {1}'.format(package, version))
答案 1 :(得分:1)
table = soup.find("table", id="packages_list")
row_data = []
for row in table.find_all("tr"):
cols = row.find_all("td")
cols = [ele.text.strip() for ele in cols]
row_data.append(cols)
我不确定你现在得到了什么结果,但试试看!
答案 2 :(得分:1)
您可以迭代tr
代码并提取包和版本:
table = soup.find('table', attrs={'class': 'listing sortable'})
package = '' ; version = ''
for i in table.select('tr'):
data = i.select('td')
if data:
package = data[0].text.strip()
version = ' '.join(data[1].text.strip().split())
print('{} : {} '.format(package,version))
#output
libopenshot : 0.1.4+0+588+107+201703310338+daily~ubuntu17.04.1
libopenshot : 0.1.4+0+588+107+201703310338+daily~ubuntu15.04.1
libopenshot : 0.1.4+0+588+107+201703310337+daily~ubuntu16.10.1
libopenshot : 0.1.4+0+588+107+201703310337+daily~ubuntu16.04.1
...
...