我想在test_df中添加一个新列,其中包含取决于change_col的a或b列的值,如果change为True。下面的for循环有效,但速度太慢。 如何使用Apply 或类似方法添加新列?
test_df = pd.DataFrame({"a":[1,1,2,3],
"b":["ant","ber","cas","dor"],
"change_col":["a","b","b","a"],
"change":[True,True,True,False]})
a b change_col change
0 1 ant a True
1 1 ber b True
2 2 cas b True
3 3 dor a False
所需的df:
a b change_col change new_value
0 1 ant a True 1
1 1 ber b True ber
2 2 cas b True cas
3 3 dor a False NaN
我的循环
new_value= []
for _ , row in test_df.iterrows():
if row["change"] is True:
new_value +=[row[row["change_column"]]]
else:
new_value += [np.NaN]
test_df["new_value"] = new_value
我在python 3.7上使用了熊猫0.24.2。
答案 0 :(得分:4)
您可以使用[DataFrame.lookup] [1],
test_df['new_val'] = test_df.lookup(test_df.index, test_df['change_col'])
a b change_col change new_val
0 1 ant a True 1
1 1 ber b True ber
2 2 cas b True cas
3 3 dor a False 3
编辑:要说明更改列,请使用条件
test_df['new_val'] = np.where(test_df['change'], test_df.lookup(test_df.index, test_df['change_col']), np.nan)
a b change_col change new_val
0 1 ant a True 1
1 1 ber b True ber
2 2 cas b True cas
3 3 dor a False NaN
答案 1 :(得分:3)
由于您有多个条件,因此我们可以在此处使用np.select
定义条件,并根据这些条件选择我们的值:
conditions = [
test_df['change_col'].eq('a') & test_df['change'].eq(True),
test_df['change_col'].eq('b') & test_df['change'].eq(True)
]
test_df['new_value'] = np.select(conditions, choicelist=[test_df['a'], test_df['b']], default=np.NaN)
输出
a b change_col change new_value
0 1 ant a True 1
1 1 ber b True ber
2 2 cas b True cas
3 3 dor a False NaN
答案 2 :(得分:0)
这是使用np.select
的解决方案:
import pandas as pd
import numpy as np
test_df = pd.DataFrame({"a": [1, 1, 2, 3],
"b": ["ant", "ber", "cas", "dor"],
"change_col": ["a", "b", "b", "a"],
"change": [True, True, True, False]})
change_a = ((test_df['change']) & (test_df['change_col'] == 'a'))
change_b = ((test_df['change']) & (test_df['change_col'] == 'b'))
dont_change = ~test_df['change']
conditions = [change_a, change_b, dont_change]
choices = [test_df['a'], test_df['b'], np.nan]
test_df["new_value"] = np.select(conditions, choices)
print(test_df)
输出:
a b change change_col new_value
0 1 ant True a 1
1 1 ber True b ber
2 2 cas True b cas
3 3 dor False a NaN