如果列包含字符串数组中的字符串,请使用

时间:2020-02-18 13:10:00

标签: python pandas dataframe

我有一个看起来像这样的数据框

     Col1     Col2    
0     22     Apple
1     43     Carrot 
2     54     Orange
3     74     Spinach
4     14     Cucumber 

我需要添加类别为“水果”,“蔬菜”或“叶子”的新列 我为每个类别创建了一个列表

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

结果应如下所示

    Col1      Col2     Category 
0     22     Apple      Fruit
1     43     Carrot     Vegetable 
2     54     Orange     Fruit
3     74     Spinach    Leaf
4     14     Cucumber   Vegetable

我尝试了 np.where 包含,但两个函数都给出了:'in'要求将字符串作为左操作数,而不要设置

3 个答案:

答案 0 :(得分:2)

那是因为您没有创建列表,而是在错误显示时创建了一个集合。您可以尝试将集合设为.isin()的参数:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1':[22,43,54,74,14],'Col2':['Apple','Carrot','Orange','Spinach','Cucumber']})

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

df['Category'] = np.where(df['Col2'].isin(Fru),'Fruit',
  np.where(df['Col2'].isin(Veg),'Vegetable',
  np.where(df['Col2'].isin(Leaf),'Leaf')))
print(df)

输出:

  Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

答案 1 :(得分:1)

Series.map与新字典d1一起使用:

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

d = {'Fruit':Fru, 'Vegetable':Veg,'Leaf':Leaf}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)

df['Category'] = df['Col2'].map(d1)
print (df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

或使用numpy.select

df['Category'] = np.select([df['Col2'].isin(Fru),df['Col2'].isin(Veg),df['Col2'].isin(Leaf)],
                           ['Fruit','Vegetable','Leaf'])
print (df)

   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

答案 2 :(得分:1)

您可以尝试使用for loop的另一种方法:

df = pd.DataFrame({'Col1': [22,43,54,74,14], 'Col2': ['Apple','Carrot','Orange','Spinach','Cucumber']})

Fruit = ['Apple','Orange', 'Grape', 'Blueberry', 'Strawberry']
Vegetable = ['Cucumber','Carrot','Broccoli', 'Onion']
Leaf = ['Lettuce', 'Kale', 'Spinach']

mylist = []
for i in df['Col2']:
    if i in Fruit:
        mylist.append('Fruit')
    elif i in Vegetable:
        mylist.append('Vegetable')
    elif i in Leaf:
        mylist.append('Leaf')

df['Category'] = mylist

print(df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable