我正在从世界500强公司网站(http://fortune.com/fortune500/list/)上获取数据。我试图显示行,就像您在网页上看到的那样。
我尝试遍历“ ul”类,并且全部打印在一行中,而不是单独的行中。
import urllib.request
from bs4 import BeautifulSoup
sauce =
urllib.request.urlopen("http://fortune.com/fortune500/list/").read()
soup = BeautifulSoup(sauce, 'html.parser')
for company in soup.findAll("ul", {"class": "company-list"}):
print(company.text)
预期结果:
Rank Company revenues($M)
1 Walmart $500,343
2 Exxon $244,363
etc.
答案 0 :(得分:0)
从您的代码中将company
设为variable
import pandas as pd
data = company.get_text("|").split("|")
rank =data[0::3][1:-1] ## not taking last line
company = data[1::3][1:]
revenue = data[2::3][1:]
df = pd.DataFrame({"rank":rank,"company":company,"revenue":revenue})
print(df.head())
ix rank company revenue
0 1 Walmart $500,343
1 2 Exxon Mobil $244,363
2 3 Berkshire Hathaway $242,137
3 4 Apple $229,234
4 5 UnitedHealth Group $201,159