根据现有列的值向熊猫数据框中添加新列

时间:2020-07-23 02:50:03

标签: python pandas dataframe

我有一个这样的熊猫数据框:

Index                   Resource
2020-07-15 11:59:02     Monkey
2020-07-16 11:59:02     Helicopter
2020-07-17 11:59:02     Forklift
2020-07-18 11:59:02     Airplane
2020-07-19 11:59:02     Dinosaur
2020-07-20 11:59:02     Drone
2020-07-20 11:59:02     Truck
2020-07-20 11:59:02     Airplane
2020-07-22 11:59:02     Truck
2020-07-22 11:59:02     Transport
2020-07-23 11:59:02     Dozer
2020-07-24 11:59:02     Patrol
2020-07-25 11:59:02     Dinosaur

我想添加一个名为“ Category”的新列,如下所示:

Index                   Resource      Category
2020-07-15 11:59:02     Monkey        Other
2020-07-16 11:59:02     Helicopter    Aviation
2020-07-17 11:59:02     Forklift      Equipment
2020-07-18 11:59:02     Airplane      Aviation
2020-07-19 11:59:02     Dinosaur      Other
2020-07-20 11:59:02     Drone         Aviation
2020-07-20 11:59:02     Truck         Equipment
2020-07-20 11:59:02     Airplane      Aviation
2020-07-22 11:59:02     Truck         Equipment
2020-07-22 11:59:02     Transport     Crew
2020-07-23 11:59:02     Dozer         Equipment
2020-07-24 11:59:02     Patrol        Crew
2020-07-25 11:59:02     Dinosaur      Other

...可能基于在以下列表中是否找到“资源”的值:

aviation_list = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
equipment_list = ['Truck', 'Dozer', 'Forklift', 'Excavator']
crew_list = ['Transport', 'Patrol', 'Stationary']

因此,如果在定义的列表中找不到“ Resource”的值,则新列“ Category”的值将默认为“ Other”。否则,“类别”分别获得“航空”,“设备”或“船员”。 (每个“资源”仅属于一个“类别”。)

我确信在熊猫中一定有一种优雅的方法可以做到这一点。谁能提供建议?

3 个答案:

答案 0 :(得分:2)

使用map创建类别值,并使用.fillna处理任何不在列表中的内容。首先,我们需要创建字典:

d = {resource: category 
     for category, lst in zip(['Aviation', 'Equipment', 'Crew'], [aviation_list, equipment_list, crew_list])
     for resource in lst}

df['Category'] = df['Resource'].map(d).fillna('Other')

                       Resource   Category
Index                                     
2020-07-15 11:59:02      Monkey      Other
2020-07-16 11:59:02  Helicopter   Aviation
2020-07-17 11:59:02    Forklift  Equipment
2020-07-18 11:59:02    Airplane   Aviation
2020-07-19 11:59:02    Dinosaur      Other
2020-07-20 11:59:02       Drone   Aviation
2020-07-20 11:59:02       Truck  Equipment
2020-07-20 11:59:02    Airplane   Aviation
2020-07-22 11:59:02       Truck  Equipment
2020-07-22 11:59:02   Transport       Crew
2020-07-23 11:59:02       Dozer  Equipment
2020-07-24 11:59:02      Patrol       Crew
2020-07-25 11:59:02    Dinosaur      Other

答案 1 :(得分:0)

您可以创建一个函数,该函数采用一个Resource值并给出一个Category

def get_category(resource):
        aviation_list = set(['Airplane', 'Helicopter', 'Drone', 'Parachute'])
        equipment_list = set(['Truck', 'Dozer', 'Forklift', 'Excavator'])
        crew_list = set(['Transport', 'Patrol', 'Stationary'])
        if resource in aviation_list:
            return 'Aviation'
        elif resource in equipment_list:
            return 'Equipment'
        elif resource in crew_list:
            return 'Crew'
        else:
            return 'Other'

然后您可以使用以下内容创建新列

# load your data
import pandas as pd
df = pd.read_clipboard() # copied from above

df['Category'] = [get_category(resource) for resource in df['Resource']]

这产生

In [9]: df
Out[9]:
               Index    Resource   Category
2020-07-15  11:59:02      Monkey      Other
2020-07-16  11:59:02  Helicopter   Aviation
2020-07-17  11:59:02    Forklift  Equipment
2020-07-18  11:59:02    Airplane   Aviation
2020-07-19  11:59:02    Dinosaur      Other
2020-07-20  11:59:02       Drone   Aviation
2020-07-20  11:59:02       Truck  Equipment
2020-07-20  11:59:02    Airplane   Aviation
2020-07-22  11:59:02       Truck  Equipment
2020-07-22  11:59:02   Transport       Crew
2020-07-23  11:59:02       Dozer  Equipment
2020-07-24  11:59:02      Patrol       Crew
2020-07-25  11:59:02    Dinosaur      Other

快速注释 ......我假设每个Resource只能属于一个类别,所以我只取找到的第一个匹配值

答案 2 :(得分:0)

您可以创建列表字典

d = {}
d['Aviation'] = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
d['Equipment'] = ['Truck', 'Dozer', 'Forklift', 'Excavator']
d['Crew'] = ['Transport', 'Patrol', 'Stationary']

创建一个接受值并返回类别的函数。

def final_pop(resource):
   if resource in d['Aviation']:
      return "Aviation"
   elif resource in d['Equipment']:
      return "Equipment"
   elif resource in d['Crew']:
      return "Crew"
   else:
      return "Others"

df['Category'] = df.apply(lambda row: final_pop(row['Resource']),axis=1)