无法在df.apply中对数据帧进行子集化

时间:2015-04-30 22:11:47

标签: python pandas dataframe

我有一个数据框,我们称之为trim_df,由user_id索引,如下所示:

           d_timestamp_dt                flagged
user_id                                         
1234567890     2015-04-30                  False
0987654321     2015-04-30                  False

我正在尝试使用df.apply()创建一个“accum”变量,如下所示:

df['new_col'] = df.apply( lambda row: my_func( row, time_period1 ), axis=1 )

以下是my_func的定义方式...注释显示了当我运行apply()时执行的内容:

def my_func( row, time_period ):
    print type( row ) # <class 'pandas.core.series.Series'>

    user_id         = row['user_id'] # 123456789
    row_time        = row['d_timestamp_dt'] # 2015-04-16 23:05:00
    user_rows       = trim_df.loc[user_id]
    print type( user_rows ) # <class 'pandas.core.series.Series'> WHY??? shouldn't it be a DataFrame?

    user_rows_of_interest = user_rows[((user_rows['flagged'] == True) &
                                      ((row_time - user_rows['d_timestamp_dt']) > time_period0) &
                                      ((row_time - user_rows['d_timestamp_dt']) < time_period))] 
    print type( user_rows_of_interest ) # <class 'pandas.tslib.Timestamp'> ...expecting this to be a DataFrame 
    return len( user_rows_of_interest ) # breaks, because Timestamp doesn't have len()

让我感到困惑的是,当我尝试单步执行函数(不使用apply)时,我得到了我期望的DataFrame,即不是Series,然后是Timestamp。真的很感激任何有关正在发生的事情的见解!

0 个答案:

没有答案