我有以下数据框(df):
loc pop_1 source_1 pop_2 source_2
a 99 group_a 77 group_b
b 93 group_a 90 group_b
c 58 group_a 59 group_b
d 47 group_a 62 group_b
我创建了一个额外的列'upper_limit':
df['upper_limit'] = df[['pop_1','pop_2']].max(axis=1)
我现在想要添加另一列,查看'upper_limit'中的值,将它们与pop_1和pop_2进行比较,然后在匹配时从source_1或source_2中选择文本。即:
loc pop_1 source_1 pop_2 source_2 upper_limit source
a 99 group_a 77 group_b 99 group_a
b 93 group_a 90 group_b 93 group_a
c 58 group_a 59 group_b 59 group_b
d 47 group_a 62 group_b 62 group_b
我尝试从pop_1和source_1创建一个dict:
table_dict = df[['pop_1','source_1']]
z = table_dict.to_dict
然后使用以下方式进行映射:
df['source'] = 'n/a'
df['source'].replace(z,inplace=True)
这将返回数据帧,但“source”列仅显示n / a结果。
答案 0 :(得分:1)
我现在想要添加另一个列,查看'upper_limit'中的值,将它们与pop_1和pop_2进行比较,然后在匹配时从source_1或source_2中选择文本。
您可以使用np.where
简单地完成此操作:
In [19]: import numpy as np
In [20]: df['upper_limit source'] = np.where(df.upper_limit == df.pop_1, df.source_1, df.source_2)
In [20]: df
Out[20]:
loc pop_1 pop_2 source_1 source_2 upper_limit upper_limit source
0 a 99 77 group_a group_b 99 group_a
1 b 93 90 group_a group_b 93 group_a
2 c 58 59 group_a group_b 59 group_b
3 d 47 62 group_a group_b 62 group_b