按日期在python熊猫中加入数据框

时间:2018-07-03 14:10:14

标签: python pandas date dataframe join

我是熊猫的初学者,在按日期加入2个数据框时遇到一些问题。

这是第一个数据帧 it contains consumption of electreciy by building id

            date  id_bat       conso
0     2014-01-01  P530B1  513.141600
1     2014-01-01  P530B3  218.871687
2     2014-01-01  P530B4  189.265570
3     2014-01-01  P530B5  156.801087
4     2014-01-01  P530B6  394.935380
5     2014-01-01  P530B7  445.643058
6     2014-01-01  P530B8  223.211640
7     2014-01-01  P530B9  366.053029
8     2014-01-01  P531B1  268.609563
9     2014-01-01  P531B2  256.978193
10    2014-01-01  P531B3  366.242837
11    2014-01-01  P531B5  186.152617
....
6794  2014-05-31  P534B9  237.089335

这是第二个 it contains meteorologique data by mounth

      dju    month
0   325.3  2014-01
1   283.2  2014-02
2   227.1  2014-03
3   142.3  2014-04
4   112.5  2014-05
5    37.3  2014-06
6    17.6  2014-07
7    36.5  2014-08
8    34.6  2014-09
9   101.7  2014-10
10  223.9  2014-11
11  368.9  2014-12

您可以在这里找到文件: https://www.4shared.com/file/9_8U4vktda/df_conso.html https://www.4shared.com/file/U-Y7yNnRee/df_dju.html 结果是这样的: the result

分离ID建筑物和ID站点后,我无法加入数据框 我尝试了两种方法,但仍然无法达到结果 任何人都可以检查加入指示的日期

这是我的代码:

import pandas as pd
import numpy as np

#import csv
#from datetime import datetime
liste_site=[]
liste_bat=[]
with open(r"*votre chemin*\df_dju.csv") as csvfile:
df2=pd.read_csv(csvfile)
with open(r"*votre chemin*\df_conso.csv") as csvfile:
df=pd.read_csv(csvfile)
for i in range(0,len(df)):
   idbat=df.iloc[i]['id_bat'] 
   site,batiment=idbat[:4], idbat[4:]
   liste_site.append(site)
   liste_bat.append(batiment)
   df=df.assign(id_site=pd.Series(liste_site))
   df=df.assign(id_batiment=pd.Series(liste_bat))
df['date']=pd.to_datetime(df['date'],format='%Y-%m-%d')
df2['month']=pd.to_datetime(df2['month'],format='%Y-%m')

#jointure avec dju seulement
resultat=df.join(df2['dju'], 
(pd.DatetimeIndex(df['date'])
.year.isin(pd.DatetimeIndex(df2['month']).year))& 
(pd.DatetimeIndex(df['date'])
.month.isin(pd.DatetimeIndex(df2['month']).month)))

#joiture complète mais dju naan
res=pd.pivot_table(df,index= 
[df['id_site'],df['date']],columns=df['id_batiment'])
res.loc[:, 'dju'] = resultat['dju']

print(res)
print(resultat)
res.to_excel('*votre chemin */resultat_test.xlsx',sheet_name='Sheet1')

1 个答案:

答案 0 :(得分:0)

df = pd.read_csv('df_conso.csv')
df1 = pd.read_csv('df_dju.csv')

# reshape columns id_site and id_bat by spliting it in two parts
df.id_bat.astype('str')
df['id_site'] = df.id_bat.str.slice(0,4)
df.id_bat = df. id_bat.str.slice(4,6)

# pivot the df and reset index
df = df.pivot_table(index=['id_site','date'] , columns='id_bat', values='conso')
df = df.reset_index()

# reshape date format
df['date'] = df['date'].str.slice(0,7)
df1.columns = ['dju', 'date']

# merge the two df
result = df.merge(df1, on='date')