Question

我有一个数据框，我们称之为trim_df，由user_id索引，如下所示：

           d_timestamp_dt                flagged
user_id                                         
1234567890     2015-04-30                  False
0987654321     2015-04-30                  False

我正在尝试使用df.apply（）创建一个“accum”变量，如下所示：

df['new_col'] = df.apply( lambda row: my_func( row, time_period1 ), axis=1 )

以下是my_func的定义方式...注释显示了当我运行apply（）时执行的内容：

def my_func( row, time_period ):
    print type( row ) # <class 'pandas.core.series.Series'>

    user_id         = row['user_id'] # 123456789
    row_time        = row['d_timestamp_dt'] # 2015-04-16 23:05:00
    user_rows       = trim_df.loc[user_id]
    print type( user_rows ) # <class 'pandas.core.series.Series'> WHY??? shouldn't it be a DataFrame?

    user_rows_of_interest = user_rows[((user_rows['flagged'] == True) &
                                      ((row_time - user_rows['d_timestamp_dt']) > time_period0) &
                                      ((row_time - user_rows['d_timestamp_dt']) < time_period))] 
    print type( user_rows_of_interest ) # <class 'pandas.tslib.Timestamp'> ...expecting this to be a DataFrame 
    return len( user_rows_of_interest ) # breaks, because Timestamp doesn't have len()

让我感到困惑的是，当我尝试单步执行函数（不使用apply）时，我得到了我期望的DataFrame，即不是Series，然后是Timestamp。真的很感激任何有关正在发生的事情的见解！

无法在df.apply中对数据帧进行子集化

0 个答案: