美丽的汤:刮擦表数据

时间:2018-09-01 15:33:49

标签: python python-3.x web-scraping beautifulsoup python-requests

我正在从下面的URL中提取表数据。具体来说,我想在第一列中提取数据。当我运行以下代码时,第一列中的数据重复多次。如何获得仅在表格中显示一次的值?

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page3.html').read()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'id':'giftList'})

rows = table.find_all('tr')

for row in rows:
    data = row.find_all('td')
    for cell in data:
        print(data[0].text)

2 个答案:

答案 0 :(得分:1)

尝试一下:

$_POST

答案 1 :(得分:1)

结合使用requests模块和selectors,您也可以尝试以下操作:

import requests
from bs4 import BeautifulSoup

link = 'http://www.pythonscraping.com/pages/page3.html'

soup = BeautifulSoup(requests.get(link).text, 'lxml')
for table in soup.select('table#giftList tr')[1:]:
    cell = table.select_one('td').get_text(strip=True)
    print(cell)

输出:

Vegetable Basket
Russian Nesting Dolls
Fish Painting
Dead Parrot
Mystery Box