使用精美的Soup模块打印数据单独的行

时间:2019-04-11 17:01:33

标签: python beautifulsoup

我正在从世界500强公司网站(http://fortune.com/fortune500/list/)上获取数据。我试图显示行,就像您在网页上看到的那样。

我尝试遍历“ ul”类,并且全部打印在一行中,而不是单独的行中。

import urllib.request
from bs4 import BeautifulSoup

sauce = 
urllib.request.urlopen("http://fortune.com/fortune500/list/").read()
soup = BeautifulSoup(sauce, 'html.parser')


for company in soup.findAll("ul", {"class": "company-list"}):
    print(company.text)

预期结果:

Rank    Company                   revenues($M)
 1        Walmart                    $500,343
 2         Exxon                     $244,363
 etc.

1 个答案:

答案 0 :(得分:0)

从您的代码中将company设为variable

import pandas as pd
data = company.get_text("|").split("|") 
rank =data[0::3][1:-1] ## not taking last line 
company = data[1::3][1:]
revenue = data[2::3][1:]
df = pd.DataFrame({"rank":rank,"company":company,"revenue":revenue})

print(df.head())

ix  rank    company revenue
0   1   Walmart $500,343
1   2   Exxon Mobil $244,363
2   3   Berkshire Hathaway  $242,137
3   4   Apple   $229,234
4   5   UnitedHealth Group  $201,159