我有一个数据框 df ,其列名称为 Category ,其中的值是
类别
家具类
技术
办公用品
重复这三个值,该列中共有1000个值。我想从 Category 列中创建一个新的列名称 Category_filter ,其值为 Furniture 和 Technology 。
df['Category_Filter'] = df[df['Category'].isin(['Furniture', 'Technology'])]
我已经尝试了上面的代码来创建新列,但是没有用。
Category_Filter
Furniture
Technology
这是所需的输出
答案 0 :(得分:0)
我假设您的意思是您想要一个数据框,其中“类别”中的值为“家具”或“技术”。这是您可以做的。
df[df['Category'].isin(['Furniture ', 'Technology '])]
如果这不是您的意思,也许您可以澄清一下。
编辑:在下面回复您的评论
df['Category_filter'] = df['Category'].where(df['Category'].isin(['Furniture ', 'Technology ']))
答案 1 :(得分:0)
如果我对您的理解不正确,则您正在寻找该列中每个元素重复的值的总数。
示例dataFrame:
>>> df
Category
0 Furniture
1 Technology
2 Office Supply
3 Furniture
4 Technology
5 Office Supply
6 Furniture
7 Technology
8 Office Supply
根据更新后的代码,应该避免,只有与您不匹配的值才会报告为NaN
。
>>> df['Category_Filter'] = df[df['Category'].isin(['Furniture', 'Technology'])]
>>> df
Category Category_Filter
0 Furniture Furniture
1 Technology Technology
2 Office Supply NaN
3 Furniture Furniture
4 Technology Technology
5 Office Supply NaN
6 Furniture Furniture
7 Technology Technology
8 Office Supply NaN
或者,如果您希望使用NaN
值删除所有行,只需尝试:
>>> df.dropna()
# df.dropna(inplace=True) # make in permanent to the DataFrame
Category Category_Filter
0 Furniture Furniture
1 Technology Technology
3 Furniture Furniture
4 Technology Technology
6 Furniture Furniture
7 Technology Technology