Pandas如何使用.loc基于其他列中的值将列设置为NaN

时间:2018-03-15 04:45:49

标签: python pandas dataframe

我有一个pandas数据帧,我正在调用一个函数来填充不符合条件的列中的na。

以下是我的代码:

def clean_feedback(DF):
    feed_id = DF.id_y.unique()
    for ID in feed_id:
        Min = np.argmin(np.abs(DF[DF.id_y == ID].created_at_x - DF[DF.id_y == ID].created_at_y))
        print(Min)
        DF[DF.id_y == ID].loc[DF[DF.id_y == ID].index != Min, 'comments'] = np.nan
        return DF[DF.id_y == ID]

示例数据框是:

id_x    user_id merchant_id amount_spent    bill_number created_at_x    checked_in  chain_id    id_y    feedback_setting_id comments    created_at_y    updated_at  feedback_type
1097    268868  975 42  149 None    2016-12-14 12:11:14 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
2150    468876  975 42  278 None    2017-06-04 10:51:47 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
6535    5020    975 42  200 None    2015-03-25 12:37:36 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
9228    476314  975 42  676 None    2017-06-09 14:34:03 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
9601    293308  975 42  438 None    2017-01-22 13:03:18 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
10215   781647  975 42  335 None    2017-08-21 13:36:43 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
20405   5441    975 42  200 None    2015-03-29 14:24:32 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
24117   277853  975 42  220 None    2016-12-25 12:57:53 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
24432   949216  975 42  219 None    2017-10-05 10:22:52 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
24475   289288  975 42  109 None    2017-01-15 08:49:55 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
32318   767980  975 42  293 None    2017-08-16 09:41:30 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1
32820   343502  975 42  387 None    2017-03-22 12:52:48 1   NaN 219 194 Lovely cafe!    2017-03-22 12:55:05 2017-10-05 06:45:49 1

每当我调用该函数时: clean_feedback(Transaction[Transaction.id_y == 219]) ,没有任何变化。我确定这是一个愚蠢的错误,但我完全被难倒了。

EDIT1:我也尝试过使用.where函数,但它会使整个数据帧变得很难。有没有办法为“评论”栏目指定?

1 个答案:

答案 0 :(得分:1)

请改为尝试:

DF.loc[(DF.id_y == ID) & (DF.index != Min), 'comments'] = np.nan 

<强>解释

  • pd.DataFrame.loc接受基于标签的布尔索引。
  • 您的2个所需条件是id_y等于IDindex!= Min
  • &运算符组合了2个布尔系列以形成单个布尔索引器