我使用此代码从纳斯达克获取了最新的交易公司列表,但是我希望将结果显示在数据框中,而不是仅列出我可能不需要的所有其他信息。
任何想法如何实现?谢谢
from bs4 import BeautifulSoup
import requests
r=requests.get('https://www.nasdaq.com/screening/companies-by
industry.aspx
exchange=NASDAQ&sortname=marketcap&sorttype=1&pagesize=4000')
data = r.text
soup = BeautifulSoup(data, "html.parser")
table = soup.find( "table", {"id":"CompanylistResults"} )
for row in table.findAll("tr"):
for cell in row("td"):
print (cell.get_text().strip())
答案 0 :(得分:2)
看起来您正在寻找恰当命名的read_html,尽管您需要四处寻找直到获得所需的东西。就您而言:
>>> import pandas as pd
>>> df=pd.read_html(table.prettify(),flavor='bs4')[0]
>>> df.columns = [c.strip() for c in df.columns]
请参见下面的输出。
第一行是完成工作的内容,第二行只是去除标题中所有讨厌的空格和新行。似乎有一个隐藏的ADR TSO
似乎没有用,因此如果您不知道它是什么,可以将其删除。丢弃所有偶数行也可能很有意义,因为它们只是奇数行的延续,而据我所知,它们是无用的链接。在一行中:
>>> df = df.drop(['ADR TSO'], axis=1) #Drop useless column
>>> df1= df[::2] #To get rid of even rows
>>> df2= df[~df['Name'].str.contains('Stock Quote')].head() #By string filtration if we are not sure about the odd/even thing
原始头的输出仅用于显示:
>>> df.head()
Name Symbol Market Cap \
0 Amazon.com, Inc. AMZN $802.18B
1 AMZN Stock Quote AMZN Ratings AMZN Stock Report NaN NaN
2 Microsoft Corporation MSFT $789.12B
3 MSFT Stock Quote MSFT Ratings MSFT Stock Report NaN NaN
4 Alphabet Inc. GOOGL $740.3B
ADR TSO Country IPO Year \
0 NaN United States 1997
1 NaN NaN NaN
2 NaN United States 1986
3 NaN NaN NaN
4 NaN United States n/a
Subsector
0 Catalog/Specialty Distribution
1 NaN
2 Computer Software: Prepackaged Software
3 NaN
4 Computer Software: Programming, Data Processing
已清除的df.head()
的输出:
Name Symbol Market Cap Country IPO Year \
0 Amazon.com, Inc. AMZN $802.18B United States 1997
2 Microsoft Corporation MSFT $789.12B United States 1986
4 Alphabet Inc. GOOGL $740.3B United States n/a
6 Alphabet Inc. GOOG $735.24B United States 2004
8 Apple Inc. AAPL $720.3B United States 1980
Subsector
0 Catalog/Specialty Distribution
2 Computer Software: Prepackaged Software
4 Computer Software: Programming, Data Processing
6 Computer Software: Programming, Data Processing
8 Computer Manufacturing