不能使用BeautifulSoup刮一张桌子

时间:2018-02-05 12:33:52

标签: python web-scraping beautifulsoup scrape

从下面的代码:我只设法获得1行数据

url = 'http://investmentmoats.com/DividendScreener/DividendScreener.php'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:0)

尝试以下方法:

url = 'http://investmentmoats.com/DividendScreener/DividendScreener.php'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
for row in soup.find('table').find_all('tr'):
    print(' '.join([x.text for x in row.find_all('td')]))
    # Or just use '[x.text for x in row.find_all('td')]' in your data frame.

部分输出:

Biz Trust - Port - Hutchison (SGD) 0.500 0.000 0.0% 0.045 9.0% 7.9% 12.1% 0.4 24.2% 13.1 24% 75%
Conglom - Sembcorp Industries 3.550 -0.020 -0.6% 0.170 4.8% 4.4% 12.8% 0.9 -2.6% 4.1 35% 37%
REIT - COM - CapitaCommercial 1.780 0.000 0.0% 0.083 4.7% 4.4% 4.1% 1.0 27.8% 32.0 2% 115%
REIT - COM - Frasers Comm 1.380 -0.050 -3.5% 0.096 7.0% 6.9% 6.8% 0.9 32.9% 23.3 5% 102%
REIT - COM - IREIT GLOBAL 0.790 -0.015 -1.9% 0.058 7.3% 7.3% 6.6% 1.2 38.9% 21.8 5% 111%
REIT - COM - Keppel 1.200 -0.030 -2.4% 0.057 4.8% 4.1% 3.0% 0.8 29.5% 38.4 7% 159%
REIT - COM - OUE 0.750 -0.005 -0.7% 0.046 6.1% 6.9% 5.5% 0.5 33.6% 29.1 3% 112%
REIT - DAT - Keppel DC 1.360 -0.060 -4.2% 0.070 5.1% 3.9% 4.7% 1.4 22.7% 31.6 7% 109%
REIT - HEA - First 1.380 -0.020 -1.4% 0.086 6.2% 4.9% 5.3% 1.2 32.4% 29.0 1% 117%

编辑:如上述程序中的评论所述,要将数据存储在数据框中,只需创建列表并将其添加到df中。

table = soup.find('table', class_='securityTable')

# Get all the column titles.
titles = [x.text for x in table.find('thead').find_all('th')]

# Get all rows and store them as lists in a list.
rows = [[x.text for x in row.find_all('td')] for row in table.find_all('tbody')]

# Create the dataframe.
df = pd.DataFrame(rows, columns=titles)

答案 1 :(得分:0)

您可以尝试以下方法:

>>> from urllib.request import Request, urlopen
>>> from bs4 import BeautifulSoup
>>> url = 'http://investmentmoats.com/DividendScreener/DividendScreener.php'
>>> req = Request(url,headers={'User-Agent': 'Mozilla/5.0'})
>>> webpage = urlopen(req).read()
>>> soup = BeautifulSoup(webpage, "html.parser")
>>> required = soup.find_all("table", {"class":"securityTable"})
>>> x = []
>>> for i in required:
...     x.append(i.get_text())
>>> for i in x:
...     print(i)