Question

我正在使用一个遍历股票代码列表的 for 循环从 Finviz 抓取几个财务指标。我在 Finviz 上遇到了空值 ('-') 的问题，这会导致对数据进行子集化的问题，因为它被识别为字符串而不是浮点数，就像我试图子集化的值一样。我想取消这些值，并一直在尝试使用 Pandas 模块中的替换功能，但没有任何运气。理想情况下，它在迭代第二个 for 循环时无效，以便它在进行时进行迭代，而不必在之后对整个列表进行迭代。代码如下所示：

# Import Libraries
import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
import numpy as np

# For custom list of stocks, edit this list below, otherwise leave commented out
stock_list = ['NVAX']

# Header required to scrape from Finviz
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
          'Upgrade-Insecure-Requests': '1', 'Cookie': 'v2=1495343816.182.19.234.142', 'Accept-Encoding': 'gzip, deflate, sdch',
           'Referer': "http://finviz.com/quote.ashx?t="}

# This function is what is used to find the metric of interest and return it
def fundamental_metric(soup, metric):
    return soup.find(text=metric).find_next(class_='snapshot-td2').text

# This function iterates through the index of the data frame (stock_list) and uses the fundemental_metric functinon to find the metric on Finviz for that stock
# Any stock in the list that cannot be scraped will return an error before moving on to the next stock
def get_fundamental_data(df):
    for symbol in df.index:
        try:
            #url = ("http://finviz.com/quote.ashx?t=" + symbol.lower())
            r = requests.get("http://finviz.com/quote.ashx?t="+ symbol.lower(),headers=headers)
            soup = bs(r.content,'html.parser')
            for m in df.columns:
                output = fundamental_metric(soup,m)
                df.loc[symbol,m] = output
                df.replace(['-'], np.NaN)
        except Exception as e:
            print (symbol, 'Not Found')
            print(e)
    return df

# List of metrics to scrape
# Before adding any metrics, ensure the metric being added is available on Finviz and the name is matched identically
metric = [
    'Price'
    , 'Change'
    , 'Index'
]

df = pd.DataFrame(index = stock_list, columns = metric)
df = get_fundamental_data(df)

print(df)

Answer 1

df.replace() 不是就地操作。你需要df = df.replace()

在For循环中替换数据框所有列中的“-”值

1 个答案: