DF1
|Project |Days
|A |20 days
|B |10 days
|A |10 days
|C |5 days
|C |7 days
|B |8 days
R = df1 ['Days']。groupby(df1 ['Project'])
[R
|20 days
|10 days
|Name: Days, dtype: timedelta64[ns],('A', 30 15 days)
|10 days
|8 days
|Name: Days, dtype: timedelta64[ns],('B', 18 9 days)
|5 days
|7 days
|Name: Days, dtype: timedelta64[ns],('C', 12 6 days)
DF2
|Project |Date |**New Date**
|A |1/10/16 |1/25/16
|A |1/8/16 |1/23/16
|C |1/2/16 |1/8/16
|B |1/9/16 |1/18/16
我要做的是通过使用df1中Project的平均天数来创建df2['New Date']
,并将该平均值添加到df2['Date']
。有什么想法吗?
还想补充一下。 “天数”列是根据excel电子表格中加载的两个日期之间的差异生成的。
****编辑****
df1.head()。to_dict( '列表')
{'Project': ['210001', '210001', '210001', '210001', '210001'], 'Days':
[Timedelta('8 days 00:00:00'), Timedelta('8 days 00:00:00'), Timedelta('12 days
00:00:00'), Timedelta('12 days 00:00:00'), Timedelta('14 days 00:00:00')]}
df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1161 entries, 0 to 1278
Data columns (total 2 columns):
Project 1161 non-null object
Days 1161 non-null timedelta64[ns]
dtypes: object(1), timedelta64[ns](1)
memory usage: 22.7+ KB
None
编辑#2 INT 这是我遇到的错误:OverflowError:int太大而无法转换
df2['New Date'] = df2['Date'] + pd.to_timedelta(df2['Days'], unit='D')
Days
20.569231
15.795455
20.569231
答案 0 :(得分:0)
import pandas as pd
df1 = pd.DataFrame(
{'Days': ['20 days', '10 days', '10 days', '5 days', '7 days', '8 days'],
'Project': ['A', 'B', 'A', 'C', 'C', 'B']})
df2 = pd.DataFrame(
{'Date': ['1/10/16', '1/8/16', '1/2/16', '1/9/16'],
'Project': ['A', 'A', 'C', 'B']})
df1['Days'] = pd.to_timedelta(df1['Days'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
df2['New Date'] = df2['Date'] + df2['Days']
print(df2)
产量
Date Project Days New Date
0 2016-01-10 A 15.0 2016-01-25
1 2016-01-08 A 15.0 2016-01-23
2 2016-01-02 C 6.0 2016-01-08
3 2016-01-09 B 9.0 2016-01-18
计算groupby/mean
:
result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
# sum count Days
# Project
# A 30 days 2 15 days
# B 18 days 2 9 days
# C 12 days 2 6 days
并将此result
与df2
合并(加入Project
):
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
# Date Project Days
# 0 2016-01-10 A 15 days
# 1 2016-01-08 A 15 days
# 2 2016-01-02 C 6 days
# 3 2016-01-09 B 9 days
然后,将Days
添加到Date
:
df2['New Date'] = df2['Date'] + df2['Days']