我正在尝试将datetime系列与存储库数据合并,同时按名称分组并对值进行求和。
File1.csv
Timeseries,Name,count
07/03/2015 06:00:00,Paris,100
07/03/2015 06:00:00,Paris,600
07/03/2015 06:00:00,Paris,700
07/03/2015 06:00:00,London,200
07/03/2015 06:00:00,London,100
07/03/2015 06:00:00,London,500
07/03/2015 06:00:00,Dublin,300
07/03/2015 06:00:00,Dublin,400
07/03/2015 06:00:00,Dublin,400
输出
Master_file.csv (append mode)
Name,Timeseries(n-1)Timeseries(n)#put the datetime series as header and put
Paris,300,1400 #Sum of all the values with same Name
London,200,800
Dublin,400,1100
Program
import pandas as pd
import numpy as np
df = pd.read_csv('/home/lat_lon1.csv')
df1 = pd.read_csv('/home/lat_lon_master.csv')
gp = df.groupby('Name')['date timeseries'].sum().reset_index()
df1.merge(gp, on='Name')
我无法将date time
列更改为标头并将正确的值置于其下。那些未找到的Names
可以被赋予NAN并在下一次迭代中被替换。
答案 0 :(得分:1)
请检查python pandas Data Frame文档 Click here 这是您正在查看的代码。
输出
Timeseries Name count 07/03/2015 06:00:00 Dublin 1100 07/03/2015 06:00:00 London 800 07/03/2015 06:00:00 Paris 1400
#!/bin/python
import pandas as pd
import numpy as np
df=pd.read_csv('/home/saiharsh/Documents/Crowd Street/Transition_Data/Telecom_7.csv') #Please enter the file Location
gp=df.groupby('Name').sum().reset_index()
flag=0
for i in gp['Name']:
if flag==1:
time=df['Timeseries'][df['Name']==i]
time=time.tail(1)
frames=[time1,time]
time1=pd.concat(frames)
else:
time1=df['Timeseries'][df['Name']==i]
time1=time1.tail(1)
flag=1
time1=time1.reset_index(drop=True)
result=pd.concat([time1,gp],axis=1,join='inner')
result=result.to_csv(index=False)
print result
Please feel free to reply if any problem.