我想循环遍历数据帧并仅在每行中匹配ID时减去日期时间。但是,如下例所示,我的for循环不起作用。所有的NAN值仍在那里,我在下面收到警告。我尝试了很多变化而不理解这个问题。
我还在下面给出了所需输出数据帧的代码。修复此问题后,我想基于Days_since列进行调整,并根据以下条件为Chg_avg_value赋值:如果Days_since条目等于NAN,则Chg_avg_value等于NAN;否则,减去平均值。值条目。
非常感谢你。我真的在与熊猫索引进行斗争。
A value is trying to be set on a copy of a slice from a DataFrame
初始数据框:
df_so_dict={'Index':[0,1,2,3,4,5,6,7,8,9],'DateOf': ['2017-08-01','2017-08-03','2017-08-04', '2017-08-07','2017-08-09','2017-08-11','2017-08-12','2017-08-02','2017-08-04','2017-08-07'],
'ID': ['553','553','553','559','559','559','559','914','914','914'], 'Count': [4,1,3,4,9,11,4,2,10,5],
'Avg. Value': [4.4,3,5.3,6.4,5,4.2,3.5,2,3.3,2.2]
}
df_so_ex2=pd.DataFrame(df_so_dict)
df_so_ex2.set_index('Index',inplace=True)
df_so_ex2['DateOf'] = pd.to_datetime(df_so_ex2['DateOf'])
df_so_ex2.dtypes #ID is an object
循环:
new=1
prev=0
df_so_ex['Days_since']=np.nan
if df_so_ex.iloc[new]['ID'] == df_so_ex.iloc[prev]['ID']:
df_so_ex.iloc[new]['Days_since']=df_so_ex.iloc[new]['DateOf'] - df_so_ex.iloc[prev]['DateOf']
new+=1
prev+=1
else:
new+=1
prev+=1
print(new)
print(prev)
所需的数据帧:
df_so_dict_ans={'Index':[0,1,2,3,4,5,6,7,8,9],'DateOf': ['2017-08-01','2017-08-03','2017-08-04', '2017-08-07','2017-08-09','2017-08-11','2017-08-12','2017-08-02','2017-08-04','2017-08-07'],
'ID': ['553','553','553','559','559','559','559','914','914','914'], 'Count': [4,1,3,4,9,11,4,2,10,5],
'Avg. Value': [4.4,3,5.3,6.4,5,4.2,3.5,2,3.3,2.2],
'Days_since':['nan',2,1,'nan',2,2,1,'nan',2,3],
'Chg_avg_value':['nan',-1.4,2.3,'nan',-1.4,-0.8,-0.7,'nan',1.3,-1.1]
}
df_so_ex_ans=pd.DataFrame(df_so_dict_ans)
df_so_ex_ans.set_index('Index',inplace=True)
答案 0 :(得分:3)
使用groupby
+ pd.Series.diff
:
g = df_so_ex2.groupby('ID')
df_so_ex2['Chg_avg_value'] = g['Avg. Value'].apply(pd.Series.diff)
df_so_ex2['Days_since'] = g['DateOf'].apply(pd.Series.diff).dt.days
print(df_so_ex2)
Avg. Value Count DateOf ID Chg_avg_value Days_since
Index
0 4.4 4 2017-08-01 553 NaN NaN
1 3.0 1 2017-08-03 553 -1.4 2.0
2 5.3 3 2017-08-04 553 2.3 1.0
3 6.4 4 2017-08-07 559 NaN NaN
4 5.0 9 2017-08-09 559 -1.4 2.0
5 4.2 11 2017-08-11 559 -0.8 2.0
6 3.5 4 2017-08-12 559 -0.7 1.0
7 2.0 2 2017-08-02 914 NaN NaN
8 3.3 10 2017-08-04 914 1.3 2.0
9 2.2 5 2017-08-07 914 -1.1 3.0
答案 1 :(得分:2)
您可以在获得结果后使用diff
concat
s=df_so_ex2.groupby('ID').apply(lambda x : pd.DataFrame({'Days_since':x['DateOf'].diff().dt.days,'Chg_avg_value':x['Avg. Value'].diff()}))
pd.concat([df_so_ex2,s],axis = 1)
Out[460]:
Avg. Value Count DateOf ID Chg_avg_value Days_since
Index
0 4.4 4 2017-08-01 553 NaN NaN
1 3.0 1 2017-08-03 553 -1.4 2.0
2 5.3 3 2017-08-04 553 2.3 1.0
3 6.4 4 2017-08-07 559 NaN NaN
4 5.0 9 2017-08-09 559 -1.4 2.0
5 4.2 11 2017-08-11 559 -0.8 2.0
6 3.5 4 2017-08-12 559 -0.7 1.0
7 2.0 2 2017-08-02 914 NaN NaN
8 3.3 10 2017-08-04 914 1.3 2.0
9 2.2 5 2017-08-07 914 -1.1 3.0
返回
EXECUTE 'CREATE USER myuser WITH UNENCRYPTED PASSWORD ''my+password''';