我已经有两个数据集,每个数据集有2列(日期,关闭) 我想将第一个数据集的日期与第二个数据集的日期进行比较,如果它们是第二个数据集的关闭取相对于相关日期的值的相同日期,则它将采用前一天的日期值。 / p>
这是数据集https://www.euronext.com/fr/products/equities/FR0000120644-XPAR https://fr.finance.yahoo.com/quote/%5EFCHI/history?period1=852105600&period2=1528873200&interval=1d&filter=history&frequency=1d
这是我的代码:
import numpy as np
from datetime import datetime , timedelta
import pandas as pd
#import cac 40 stock index (dataset1)
df = pd.read_csv('cac 40.csv')
df = pd.DataFrame(df)
#import Danone index(dataset2)
df1 = pd.read_excel('Price_Data_Danone.xlsx',header=3)
df1 = pd.DataFrame(df1)
#check the number of observation of both datasets and get the minimum number
if len(df1)>len(df):
size=len(df)
elif len(df1)<len(df):
size=len(df1)
else:
size=len(df)
#get new close values of dataset2 relative to the date in datset1
close1=np.zeros((size))
for i in range(0,size,1):
# find the date of dataset1 in dataset 2
if (df['Date'][i]in df1['Date']):
#get the index of the date and the corresponding value of close and store it in close1
close1[i]=df['close'][df1.loc['Date'][i], df['Date']]
else:
#if the date doesen't exist in datset2
#take value of close of previous date of datatset1
close1[i]=df['close'][df1.loc['Date'][i-1], df['Date']]
这是我的踪迹,我收到了这个错误: KeyError:&#39;标签[Date]不在[index]&#39; 例子:
我们寻找价值df['Date'][1]
=&#39; 5/06/2009&#39;在df1['Date']
列中
我们在df1['Date']
得到它的索引
然后close1=df1['close'][index]
如果df['Date'][1]
=&#39; 5/06/2009&#39;不在df1['Date']
我们得到上一个日期的索引df['Date'][0]
=&#39; 4/06/2009&#39;
close1=df1['close'][previous index]
答案 0 :(得分:1)
你的错误发生在行:
close1[i]=df['close'][df1.loc['Date'][i], df['Date']]
如果您的目标是从close
给定df
索引获取i
值,则应写入:
close[i] = df['close'][i]
看看是否有帮助,遗憾的是我并不完全理解您要完成的任务,例如为什么要将size
设置为较短的表的长度?
此外,只要我下载了正确的文件,您的条件df['Date'][i]in df1['Date']
可能无效,一种日期格式使用-
而另一种\
。
<强>解决方案强>
import pandas as pd
pd.set_option('expand_frame_repr', False)
# load both files
df = pd.read_csv('CAC.csv')
df1 = pd.read_csv('DANONE.csv', header=3)
# ensure date format is the same between two
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
# you need only Date and Close columns as far as I understand
keep_columns = ['Date', 'Close']
# let's keep only these columns then
df = df[keep_columns]
df1 = df1[keep_columns]
# merge two tables on Date, method is left so that for every row in df we
# 'append' row from df1 if possible, if not there will be NaN value,
# for readability I added suffixes df - CAC and df1 - DANONE
merged = pd.merge(df,
df1,
on='Date',
how='left',
suffixes=['CAC', 'DANONE'])
# now for all missing values in CloseDANONE, so if there is Date in df
# but not in df1 we fill this value with LAST available
merged.CloseDANONE.fillna(method='ffill', inplace=True)
# we get values from CloseDANONE column as long as it's not null
close1 = merged.loc[merged.CloseDANONE.notnull(), 'CloseDANONE'].values
下面你可以看到: 来自df - CAC的最后6个值
Date Close
5522 2018-06-06 5457.560059
5523 2018-06-07 5448.359863
5524 2018-06-08 5450.220215
5525 2018-06-11 5473.910156
5526 2018-06-12 5453.370117
5527 2018-06-13 5468.240234
来自df1的最后6个值 - 达能:
Date Close
0 2018-06-06 63.86
1 2018-06-07 63.71
2 2018-06-08 64.31
3 2018-06-11 64.91
4 2018-06-12 65.43
合并后的最后6行:
Date CloseCAC CloseDANONE
5522 2018-06-06 5457.560059 63.86
5523 2018-06-07 5448.359863 63.71
5524 2018-06-08 5450.220215 64.31
5525 2018-06-11 5473.910156 64.91
5526 2018-06-12 5453.370117 65.43
5527 2018-06-13 5468.240234 65.43
对于df
中出现的每个值,我们都会从df1
获得值,但2018-06-13
中不存在df1
,因此我将其填入最后一个可用值,即来自65.43
的{{1}}。