我有一个这样的熊猫数据框:
Index Resource
2020-07-15 11:59:02 Monkey
2020-07-16 11:59:02 Helicopter
2020-07-17 11:59:02 Forklift
2020-07-18 11:59:02 Airplane
2020-07-19 11:59:02 Dinosaur
2020-07-20 11:59:02 Drone
2020-07-20 11:59:02 Truck
2020-07-20 11:59:02 Airplane
2020-07-22 11:59:02 Truck
2020-07-22 11:59:02 Transport
2020-07-23 11:59:02 Dozer
2020-07-24 11:59:02 Patrol
2020-07-25 11:59:02 Dinosaur
我想添加一个名为“ Category”的新列,如下所示:
Index Resource Category
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
...可能基于在以下列表中是否找到“资源”的值:
aviation_list = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
equipment_list = ['Truck', 'Dozer', 'Forklift', 'Excavator']
crew_list = ['Transport', 'Patrol', 'Stationary']
因此,如果在定义的列表中找不到“ Resource”的值,则新列“ Category”的值将默认为“ Other”。否则,“类别”分别获得“航空”,“设备”或“船员”。 (每个“资源”仅属于一个“类别”。)
我确信在熊猫中一定有一种优雅的方法可以做到这一点。谁能提供建议?
答案 0 :(得分:2)
使用map
创建类别值,并使用.fillna
处理任何不在列表中的内容。首先,我们需要创建字典:
d = {resource: category
for category, lst in zip(['Aviation', 'Equipment', 'Crew'], [aviation_list, equipment_list, crew_list])
for resource in lst}
df['Category'] = df['Resource'].map(d).fillna('Other')
Resource Category
Index
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
答案 1 :(得分:0)
您可以创建一个函数,该函数采用一个Resource
值并给出一个Category
def get_category(resource):
aviation_list = set(['Airplane', 'Helicopter', 'Drone', 'Parachute'])
equipment_list = set(['Truck', 'Dozer', 'Forklift', 'Excavator'])
crew_list = set(['Transport', 'Patrol', 'Stationary'])
if resource in aviation_list:
return 'Aviation'
elif resource in equipment_list:
return 'Equipment'
elif resource in crew_list:
return 'Crew'
else:
return 'Other'
然后您可以使用以下内容创建新列
# load your data
import pandas as pd
df = pd.read_clipboard() # copied from above
df['Category'] = [get_category(resource) for resource in df['Resource']]
这产生
In [9]: df
Out[9]:
Index Resource Category
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
快速注释 ......我假设每个Resource
只能属于一个类别,所以我只取找到的第一个匹配值
答案 2 :(得分:0)
您可以创建列表字典
d = {}
d['Aviation'] = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
d['Equipment'] = ['Truck', 'Dozer', 'Forklift', 'Excavator']
d['Crew'] = ['Transport', 'Patrol', 'Stationary']
创建一个接受值并返回类别的函数。
def final_pop(resource):
if resource in d['Aviation']:
return "Aviation"
elif resource in d['Equipment']:
return "Equipment"
elif resource in d['Crew']:
return "Crew"
else:
return "Others"
df['Category'] = df.apply(lambda row: final_pop(row['Resource']),axis=1)