python中的日期比较

时间:2018-06-13 10:07:03

标签: python pandas

我已经有两个数据集,每个数据集有2列(日期,关闭) 我想将第一个数据集的日期与第二个数据集的日期进行比较,如果它们是第二个数据集的关闭取相对于相关日期的值的相同日期,则它将采用前一天的日期值。 / p>

这是数据集https://www.euronext.com/fr/products/equities/FR0000120644-XPAR https://fr.finance.yahoo.com/quote/%5EFCHI/history?period1=852105600&period2=1528873200&interval=1d&filter=history&frequency=1d

这是我的代码:

import numpy as np
from datetime import datetime , timedelta
import pandas as pd
#import cac 40 stock index (dataset1)
df = pd.read_csv('cac 40.csv')
df = pd.DataFrame(df) 
#import Danone index(dataset2)
df1 = pd.read_excel('Price_Data_Danone.xlsx',header=3)
df1 = pd.DataFrame(df1) 
#check the number of observation of both datasets and get the minimum number
if len(df1)>len(df):
    size=len(df)
elif len(df1)<len(df):
     size=len(df1)
else:
     size=len(df)
#get new close values of dataset2 relative to the date in datset1
close1=np.zeros((size))
for i in range(0,size,1):
    # find the date of dataset1 in dataset 2
    if (df['Date'][i]in df1['Date']):
    #get the index of the date and the corresponding value of close and store it in close1
        close1[i]=df['close'][df1.loc['Date'][i], df['Date']]
    else:
        #if the date doesen't exist in datset2
    #take value of close of previous date of datatset1
        close1[i]=df['close'][df1.loc['Date'][i-1], df['Date']]

这是我的踪迹,我收到了这个错误: KeyError:&#39;标签[Date]不在[index]&#39; 例子:

我们寻找价值df['Date'][1] =&#39; 5/06/2009&#39;在df1['Date']列中 我们在df1['Date']得到它的索引 然后close1=df1['close'][index] 如果df['Date'][1] =&#39; 5/06/2009&#39;不在df1['Date'] 我们得到上一个日期的索引df['Date'][0] =&#39; 4/06/2009&#39; close1=df1['close'][previous index]

1 个答案:

答案 0 :(得分:1)

你的错误发生在行:

close1[i]=df['close'][df1.loc['Date'][i], df['Date']]

如果您的目标是从close给定df索引获取i值,则应写入:

close[i] = df['close'][i]

看看是否有帮助,遗憾的是我并不完全理解您要完成的任务,例如为什么要将size设置为较短的表的长度? 此外,只要我下载了正确的文件,您的条件df['Date'][i]in df1['Date']可能无效,一种日期格式使用-而另一种\

<强>解决方案

import pandas as pd


pd.set_option('expand_frame_repr', False)

# load both files
df = pd.read_csv('CAC.csv')
df1 = pd.read_csv('DANONE.csv', header=3)

# ensure date format is the same between two
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df1.Date = pd.to_datetime(df1.Date, dayfirst=True)

# you need only Date and Close columns as far as I understand
keep_columns = ['Date', 'Close']

# let's keep only these columns then
df = df[keep_columns]
df1 = df1[keep_columns]

# merge two tables on Date, method is left so that for every row in df we 
# 'append' row from df1 if possible, if not there will be NaN value, 
# for readability I added suffixes df - CAC and df1 - DANONE
merged = pd.merge(df,
                  df1,
                  on='Date',
                  how='left',
                  suffixes=['CAC', 'DANONE'])

# now for all missing values in CloseDANONE, so if there is Date in df
# but not in df1 we fill this value with LAST available
merged.CloseDANONE.fillna(method='ffill', inplace=True)

# we get values from CloseDANONE column as long as it's not null
close1 = merged.loc[merged.CloseDANONE.notnull(), 'CloseDANONE'].values

下面你可以看到: 来自df - CAC的最后6个值

           Date        Close
5522 2018-06-06  5457.560059
5523 2018-06-07  5448.359863
5524 2018-06-08  5450.220215
5525 2018-06-11  5473.910156
5526 2018-06-12  5453.370117
5527 2018-06-13  5468.240234

来自df1的最后6个值 - 达能:

        Date  Close
0 2018-06-06  63.86
1 2018-06-07  63.71
2 2018-06-08  64.31
3 2018-06-11  64.91
4 2018-06-12  65.43

合并后的最后6行:

           Date     CloseCAC  CloseDANONE
5522 2018-06-06  5457.560059        63.86
5523 2018-06-07  5448.359863        63.71
5524 2018-06-08  5450.220215        64.31
5525 2018-06-11  5473.910156        64.91
5526 2018-06-12  5453.370117        65.43
5527 2018-06-13  5468.240234        65.43

对于df中出现的每个值,我们都会从df1获得值,但2018-06-13中不存在df1,因此我将其填入最后一个可用值,即来自65.43的{​​{1}}。