熊猫减去日期时间条件

时间:2018-06-05 17:16:11

标签: python pandas datetime dataframe

我想循环遍历数据帧并仅在每行中匹配ID时减去日期时间。但是,如下例所示,我的for循环不起作用。所有的NAN值仍在那里,我在下面收到警告。我尝试了很多变化而不理解这个问题。

我还在下面给出了所需输出数据帧的代码。修复此问题后,我想基于Days_since列进行调整,并根据以下条件为Chg_avg_value赋值:如果Days_since条目等于NAN,则Chg_avg_value等于NAN;否则,减去平均值。值条目。

非常感谢你。我真的在与熊猫索引进行斗争。

A value is trying to be set on a copy of a slice from a DataFrame

初始数据框:

df_so_dict={'Index':[0,1,2,3,4,5,6,7,8,9],'DateOf': ['2017-08-01','2017-08-03','2017-08-04', '2017-08-07','2017-08-09','2017-08-11','2017-08-12','2017-08-02','2017-08-04','2017-08-07'],
    'ID': ['553','553','553','559','559','559','559','914','914','914'], 'Count': [4,1,3,4,9,11,4,2,10,5],
    'Avg. Value': [4.4,3,5.3,6.4,5,4.2,3.5,2,3.3,2.2]
    }
df_so_ex2=pd.DataFrame(df_so_dict)
df_so_ex2.set_index('Index',inplace=True)
df_so_ex2['DateOf'] = pd.to_datetime(df_so_ex2['DateOf'])
df_so_ex2.dtypes #ID is an object

循环:

new=1
prev=0
df_so_ex['Days_since']=np.nan

    if df_so_ex.iloc[new]['ID'] == df_so_ex.iloc[prev]['ID']:
        df_so_ex.iloc[new]['Days_since']=df_so_ex.iloc[new]['DateOf'] - df_so_ex.iloc[prev]['DateOf']
        new+=1
        prev+=1
    else:
        new+=1
        prev+=1
    print(new)
    print(prev)

所需的数据帧:

df_so_dict_ans={'Index':[0,1,2,3,4,5,6,7,8,9],'DateOf': ['2017-08-01','2017-08-03','2017-08-04', '2017-08-07','2017-08-09','2017-08-11','2017-08-12','2017-08-02','2017-08-04','2017-08-07'],
    'ID': ['553','553','553','559','559','559','559','914','914','914'], 'Count': [4,1,3,4,9,11,4,2,10,5],
    'Avg. Value': [4.4,3,5.3,6.4,5,4.2,3.5,2,3.3,2.2],
    'Days_since':['nan',2,1,'nan',2,2,1,'nan',2,3],
    'Chg_avg_value':['nan',-1.4,2.3,'nan',-1.4,-0.8,-0.7,'nan',1.3,-1.1]
    }

df_so_ex_ans=pd.DataFrame(df_so_dict_ans)
df_so_ex_ans.set_index('Index',inplace=True)

2 个答案:

答案 0 :(得分:3)

使用groupby + pd.Series.diff

g = df_so_ex2.groupby('ID')

df_so_ex2['Chg_avg_value'] = g['Avg. Value'].apply(pd.Series.diff)
df_so_ex2['Days_since'] = g['DateOf'].apply(pd.Series.diff).dt.days

print(df_so_ex2)

       Avg. Value  Count     DateOf   ID  Chg_avg_value  Days_since
Index                                                              
0             4.4      4 2017-08-01  553            NaN         NaN
1             3.0      1 2017-08-03  553           -1.4         2.0
2             5.3      3 2017-08-04  553            2.3         1.0
3             6.4      4 2017-08-07  559            NaN         NaN
4             5.0      9 2017-08-09  559           -1.4         2.0
5             4.2     11 2017-08-11  559           -0.8         2.0
6             3.5      4 2017-08-12  559           -0.7         1.0
7             2.0      2 2017-08-02  914            NaN         NaN
8             3.3     10 2017-08-04  914            1.3         2.0
9             2.2      5 2017-08-07  914           -1.1         3.0

答案 1 :(得分:2)

您可以在获得结果后使用diff concat s=df_so_ex2.groupby('ID').apply(lambda x : pd.DataFrame({'Days_since':x['DateOf'].diff().dt.days,'Chg_avg_value':x['Avg. Value'].diff()})) pd.concat([df_so_ex2,s],axis = 1) Out[460]: Avg. Value Count DateOf ID Chg_avg_value Days_since Index 0 4.4 4 2017-08-01 553 NaN NaN 1 3.0 1 2017-08-03 553 -1.4 2.0 2 5.3 3 2017-08-04 553 2.3 1.0 3 6.4 4 2017-08-07 559 NaN NaN 4 5.0 9 2017-08-09 559 -1.4 2.0 5 4.2 11 2017-08-11 559 -0.8 2.0 6 3.5 4 2017-08-12 559 -0.7 1.0 7 2.0 2 2017-08-02 914 NaN NaN 8 3.3 10 2017-08-04 914 1.3 2.0 9 2.2 5 2017-08-07 914 -1.1 3.0 返回

EXECUTE 'CREATE USER myuser WITH UNENCRYPTED PASSWORD ''my+password''';