在两个数据框的基础上创建一个熊猫数据框

时间:2020-07-02 13:59:00

标签: python pandas dataframe

我想将两个数据框组合起来,每天获取标普成份股。

我已从Wiki获得数据:

data = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')

第一个数据框包含今天的标普成分:

# Get current S&P table and set header column
sp500 = data[0].iloc[1:,[0,1,6,7]]
columns = ['added_ticker', 'name', 'date', 'cik']
sp500.columns = columns
sp500.loc[sp500['date'].isnull(), 'date'] = '1957-01-01'

# One date is in the wrong format. Correcting it.
sp500.loc[~sp500['date'].str.match('\d{4}-\d{2}-\d{2}'), 'date'] = '1985-01-01'
sp500.loc[:,'date'].apply(lambda x: datetime.strptime(x[0:10],'%Y-%m-%d'))
sp500 = pd.melt(sp500, id_vars=['date', 'name', 'cik'], value_vars=['added_ticker'])
sp500.head()

第二个数据框包含标准普尔指数的历史变化。

# Get S&P500 adjustments table and set columns
sp500_adjustments = data[1]
sp500_adjustments = sp500_adjustments[2:].copy()
columns = ['date', 'added_ticker', 'added_name', 'removed_ticker', 'removed_name', 'reason']
sp500_adjustments.columns = columns
updates = sp500_adjustments[~sp500_adjustments['date'].str.contains(',')].T.shift(1).T
sp500_adjustments['date'].loc[~sp500_adjustments['date'].str.contains(',')] = np.nan
sp500_adjustments[sp500_adjustments['added_ticker'].isnull()]
sp500_adjustments.update(updates)
sp500_adjustments['date'].loc[sp500_adjustments['date'].isnull()] = sp500_adjustments['date'].T.shift(1).T
sp500_adjustments['date'].loc[sp500_adjustments['date'].isnull()] = sp500_adjustments['date'].T.shift(1).T
sp500_adjustments['date'].loc[sp500_adjustments['date'].isnull()] = sp500_adjustments['date'].T.shift(1).T
sp500_adjustments['date'].loc[sp500_adjustments['date'].isnull()] = sp500_adjustments['date'].T.shift(1).T
sp500_adjustments['date'].loc[sp500_adjustments['date'].isnull()] = sp500_adjustments['date'].T.shift(1).T
sp500_additions = sp500_adjustments[~sp500_adjustments['added_ticker'].isnull()]
sp500_additions = sp500_additions[['date', 'added_ticker', 'added_name']]
sp500_additions.rename(columns={'added_name': 'name'}, inplace=True)
sp500_additions = pd.melt(sp500_additions, id_vars=['date','name'], value_vars=['added_ticker'])
sp500_deletions = sp500_adjustments[~sp500_adjustments['removed_ticker'].isnull()]
sp500_deletions = sp500_deletions[['date', 'removed_ticker', 'removed_name']]
sp500_deletions.rename(columns={'removed_name': 'name'}, inplace=True)
sp500_deletions = pd.melt(sp500_deletions, id_vars=['date','name'], value_vars=['removed_ticker'])
sp500_history = pd.concat([sp500_deletions, sp500_additions])
sp500_history['date'] = sp500_history.loc[:,'date'].apply(lambda x: datetime.strptime(x, '%B %d, %Y'))
sp500_history.sort_values(by='date', ascending=True, inplace=True)
sp500_history.set_index(keys='date', inplace=True)
sp500_history.head()

我想获取一个字典或数据框,该记录或数据框显示每天标准普尔指数的历史成分。例如:

2020-07-02 RAD, MSFT, AAPL, NVR, TWTR....
2020-07-01 AT, MSFT, AAPL,NVR, TWTR....

0 个答案:

没有答案
相关问题