Groupby到数据帧timedelta64 [ns]

时间:2016-11-08 20:50:31

标签: python dataframe

DF1

|Project |Days
|A       |20 days
|B       |10 days
|A       |10 days
|C       |5 days
|C       |7 days
|B       |8 days

R = df1 ['Days']。groupby(df1 ['Project'])

[R

|20 days
|10 days
|Name: Days, dtype: timedelta64[ns],('A', 30   15 days)
|10 days
|8 days
|Name: Days, dtype: timedelta64[ns],('B', 18   9 days)
|5 days
|7 days
|Name: Days, dtype: timedelta64[ns],('C', 12   6 days)

DF2

|Project  |Date     |**New Date**
|A        |1/10/16  |1/25/16 
|A        |1/8/16   |1/23/16
|C        |1/2/16   |1/8/16
|B        |1/9/16   |1/18/16    

我要做的是通过使用df1中Project的平均天数来创建df2['New Date'],并将该平均值添加到df2['Date']。有什么想法吗?

还想补充一下。 “天数”列是根据excel电子表格中加载的两个日期之间的差异生成的。

****编辑****

df1.head()。to_dict( '列表')

 {'Project': ['210001', '210001', '210001', '210001', '210001'], 'Days':
 [Timedelta('8 days 00:00:00'), Timedelta('8 days 00:00:00'), Timedelta('12 days
 00:00:00'), Timedelta('12 days 00:00:00'), Timedelta('14 days 00:00:00')]}

df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1161 entries, 0 to 1278
Data columns (total 2 columns):
Project    1161 non-null object
Days      1161 non-null timedelta64[ns]
dtypes: object(1), timedelta64[ns](1)
memory usage: 22.7+ KB
None

编辑#2 INT 这是我遇到的错误:OverflowError:int太大而无法转换

df2['New Date'] = df2['Date'] + pd.to_timedelta(df2['Days'], unit='D')

Days
20.569231
15.795455
20.569231

1 个答案:

答案 0 :(得分:0)

import pandas as pd
df1 = pd.DataFrame(
    {'Days': ['20 days', '10 days', '10 days', '5 days', '7 days', '8 days'],
     'Project': ['A', 'B', 'A', 'C', 'C', 'B']}) 

df2 = pd.DataFrame(
    {'Date': ['1/10/16', '1/8/16', '1/2/16', '1/9/16'],
     'Project': ['A', 'A', 'C', 'B']})

df1['Days'] = pd.to_timedelta(df1['Days']) 
df2['Date'] = pd.to_datetime(df2['Date'])

result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
df2['New Date'] = df2['Date'] + df2['Days']
print(df2)

产量

        Date Project  Days   New Date
0 2016-01-10       A  15.0 2016-01-25
1 2016-01-08       A  15.0 2016-01-23
2 2016-01-02       C   6.0 2016-01-08
3 2016-01-09       B   9.0 2016-01-18

计算groupby/mean

result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
#             sum  count    Days
# Project                       
# A       30 days      2 15 days
# B       18 days      2  9 days
# C       12 days      2  6 days

并将此resultdf2合并(加入Project):

df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
#         Date Project    Days
# 0 2016-01-10       A 15 days
# 1 2016-01-08       A 15 days
# 2 2016-01-02       C  6 days
# 3 2016-01-09       B  9 days

然后,将Days添加到Date

df2['New Date'] = df2['Date'] + df2['Days']