我正在使用一个遍历股票代码列表的 for 循环从 Finviz 抓取几个财务指标。我在 Finviz 上遇到了空值 ('-') 的问题,这会导致对数据进行子集化的问题,因为它被识别为字符串而不是浮点数,就像我试图子集化的值一样。我想取消这些值,并一直在尝试使用 Pandas 模块中的替换功能,但没有任何运气。理想情况下,它在迭代第二个 for 循环时无效,以便它在进行时进行迭代,而不必在之后对整个列表进行迭代。代码如下所示:
# Import Libraries
import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
import numpy as np
# For custom list of stocks, edit this list below, otherwise leave commented out
stock_list = ['NVAX']
# Header required to scrape from Finviz
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'Upgrade-Insecure-Requests': '1', 'Cookie': 'v2=1495343816.182.19.234.142', 'Accept-Encoding': 'gzip, deflate, sdch',
'Referer': "http://finviz.com/quote.ashx?t="}
# This function is what is used to find the metric of interest and return it
def fundamental_metric(soup, metric):
return soup.find(text=metric).find_next(class_='snapshot-td2').text
# This function iterates through the index of the data frame (stock_list) and uses the fundemental_metric functinon to find the metric on Finviz for that stock
# Any stock in the list that cannot be scraped will return an error before moving on to the next stock
def get_fundamental_data(df):
for symbol in df.index:
try:
#url = ("http://finviz.com/quote.ashx?t=" + symbol.lower())
r = requests.get("http://finviz.com/quote.ashx?t="+ symbol.lower(),headers=headers)
soup = bs(r.content,'html.parser')
for m in df.columns:
output = fundamental_metric(soup,m)
df.loc[symbol,m] = output
df.replace(['-'], np.NaN)
except Exception as e:
print (symbol, 'Not Found')
print(e)
return df
# List of metrics to scrape
# Before adding any metrics, ensure the metric being added is available on Finviz and the name is matched identically
metric = [
'Price'
, 'Change'
, 'Index'
]
df = pd.DataFrame(index = stock_list, columns = metric)
df = get_fundamental_data(df)
print(df)
答案 0 :(得分:1)
df.replace()
不是就地操作。你需要df = df.replace()