我有一个看起来像这样的数据框
Col1 Col2
0 22 Apple
1 43 Carrot
2 54 Orange
3 74 Spinach
4 14 Cucumber
我需要添加类别为“水果”,“蔬菜”或“叶子”的新列 我为每个类别创建了一个列表
Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}
结果应如下所示
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
我尝试了 np.where 和包含,但两个函数都给出了:'in'要求将字符串作为左操作数,而不要设置
答案 0 :(得分:2)
那是因为您没有创建列表,而是在错误显示时创建了一个集合。您可以尝试将集合设为.isin()
的参数:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1':[22,43,54,74,14],'Col2':['Apple','Carrot','Orange','Spinach','Cucumber']})
Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}
df['Category'] = np.where(df['Col2'].isin(Fru),'Fruit',
np.where(df['Col2'].isin(Veg),'Vegetable',
np.where(df['Col2'].isin(Leaf),'Leaf')))
print(df)
输出:
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
答案 1 :(得分:1)
将Series.map
与新字典d1
一起使用:
Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}
d = {'Fruit':Fru, 'Vegetable':Veg,'Leaf':Leaf}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
df['Category'] = df['Col2'].map(d1)
print (df)
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
或使用numpy.select
:
df['Category'] = np.select([df['Col2'].isin(Fru),df['Col2'].isin(Veg),df['Col2'].isin(Leaf)],
['Fruit','Vegetable','Leaf'])
print (df)
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
答案 2 :(得分:1)
您可以尝试使用for loop
的另一种方法:
df = pd.DataFrame({'Col1': [22,43,54,74,14], 'Col2': ['Apple','Carrot','Orange','Spinach','Cucumber']})
Fruit = ['Apple','Orange', 'Grape', 'Blueberry', 'Strawberry']
Vegetable = ['Cucumber','Carrot','Broccoli', 'Onion']
Leaf = ['Lettuce', 'Kale', 'Spinach']
mylist = []
for i in df['Col2']:
if i in Fruit:
mylist.append('Fruit')
elif i in Vegetable:
mylist.append('Vegetable')
elif i in Leaf:
mylist.append('Leaf')
df['Category'] = mylist
print(df)
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable