我有一个csv
文件,如下:
Landform Number Name Class
0 Deltaic Plain 912 Lx NaN
1 Hummock and Swale 912 Lx NaN
2 Sand Dunes 912 Lx NaN
3 Hummock and Swale 939 Woodbury NaN
4 Sand Dunes 939 Woodbury NaN
当地形包含特定Deltaic Plain
的{{1}},Hummock and Swale
和Sand Dunes
时,我想将值1分配给Name
。
当Class
包含Landform
和Hummock and Swale
时,我想为Sand Dunes
分配值2。
我想要的输出是:
Class
我知道如何只为这样做一行:
Landform Number Name Class
0 Deltaic Plain 912 Lx 1
1 Hummock and Swale 912 Lx 1
2 Sand Dunes 912 Lx 1
3 Hummock and Swale 939 Woodbury 2
4 Sand Dunes 939 Woodbury 2
但我不确定如何按def f(x):
if x['Landform'] == 'Hummock and Swale' : return '1'
else: return '2'
df['Class'] = df.apply(f, axis=1)
进行分组,然后根据多行创建条件函数。
答案 0 :(得分:1)
我们的想法是对您的Number列进行分组,并应用一个函数来查看该组中的所有地形并返回一个合适的类。这是一个例子:
def determineClass(landforms):
if all(form in landforms.values for form in ('Deltaic Plain', 'Hummock and Swale', 'Sand Dunes')):
return 1
elif all(form in landforms.values for form in ('Hummock and Swale', 'Sand Dunes')):
return 2
# etc.
else:
# return "default" class
return 0
>>> df.groupby('Number').Landform.apply(determineClass)
Number
912 1
939 2
Name: Landform, dtype: int64
如果您想将这些值分配回“类”列,请使用map
,如20分钟前的this question中所述:
>>> classes = df.groupby('Number').Landform.apply(determineClass)
>>> df['Class'] = df.Number.map(classes)
>>> df
Landform Number Name Class
0 Deltaic Plain 912 Lx 1
1 Hummock and Swale 912 Lx 1
2 Sand Dunes 912 Lx 1
3 Hummock and Swale 939 Woodbury 2
4 Sand Dunes 939 Woodbury 2