我是熊猫的初学者,在按日期加入2个数据框时遇到一些问题。
这是第一个数据帧 it contains consumption of electreciy by building id
date id_bat conso
0 2014-01-01 P530B1 513.141600
1 2014-01-01 P530B3 218.871687
2 2014-01-01 P530B4 189.265570
3 2014-01-01 P530B5 156.801087
4 2014-01-01 P530B6 394.935380
5 2014-01-01 P530B7 445.643058
6 2014-01-01 P530B8 223.211640
7 2014-01-01 P530B9 366.053029
8 2014-01-01 P531B1 268.609563
9 2014-01-01 P531B2 256.978193
10 2014-01-01 P531B3 366.242837
11 2014-01-01 P531B5 186.152617
....
6794 2014-05-31 P534B9 237.089335
这是第二个 it contains meteorologique data by mounth
dju month
0 325.3 2014-01
1 283.2 2014-02
2 227.1 2014-03
3 142.3 2014-04
4 112.5 2014-05
5 37.3 2014-06
6 17.6 2014-07
7 36.5 2014-08
8 34.6 2014-09
9 101.7 2014-10
10 223.9 2014-11
11 368.9 2014-12
您可以在这里找到文件: https://www.4shared.com/file/9_8U4vktda/df_conso.html https://www.4shared.com/file/U-Y7yNnRee/df_dju.html 结果是这样的: the result
分离ID建筑物和ID站点后,我无法加入数据框 我尝试了两种方法,但仍然无法达到结果 任何人都可以检查加入指示的日期
这是我的代码:
import pandas as pd
import numpy as np
#import csv
#from datetime import datetime
liste_site=[]
liste_bat=[]
with open(r"*votre chemin*\df_dju.csv") as csvfile:
df2=pd.read_csv(csvfile)
with open(r"*votre chemin*\df_conso.csv") as csvfile:
df=pd.read_csv(csvfile)
for i in range(0,len(df)):
idbat=df.iloc[i]['id_bat']
site,batiment=idbat[:4], idbat[4:]
liste_site.append(site)
liste_bat.append(batiment)
df=df.assign(id_site=pd.Series(liste_site))
df=df.assign(id_batiment=pd.Series(liste_bat))
df['date']=pd.to_datetime(df['date'],format='%Y-%m-%d')
df2['month']=pd.to_datetime(df2['month'],format='%Y-%m')
#jointure avec dju seulement
resultat=df.join(df2['dju'],
(pd.DatetimeIndex(df['date'])
.year.isin(pd.DatetimeIndex(df2['month']).year))&
(pd.DatetimeIndex(df['date'])
.month.isin(pd.DatetimeIndex(df2['month']).month)))
#joiture complète mais dju naan
res=pd.pivot_table(df,index=
[df['id_site'],df['date']],columns=df['id_batiment'])
res.loc[:, 'dju'] = resultat['dju']
print(res)
print(resultat)
res.to_excel('*votre chemin */resultat_test.xlsx',sheet_name='Sheet1')
答案 0 :(得分:0)
df = pd.read_csv('df_conso.csv')
df1 = pd.read_csv('df_dju.csv')
# reshape columns id_site and id_bat by spliting it in two parts
df.id_bat.astype('str')
df['id_site'] = df.id_bat.str.slice(0,4)
df.id_bat = df. id_bat.str.slice(4,6)
# pivot the df and reset index
df = df.pivot_table(index=['id_site','date'] , columns='id_bat', values='conso')
df = df.reset_index()
# reshape date format
df['date'] = df['date'].str.slice(0,7)
df1.columns = ['dju', 'date']
# merge the two df
result = df.merge(df1, on='date')