我试图在我的数据框中添加一个额外的列。我根据其他变量if/else
和var1
使用var2
条件,并尝试将return
值添加到此新列prediction
中。通过使用以下代码,我可以创建一个列名,但由于某些我无法弄清楚的新列是EMPTY。没有错误,所以我假设return
出现了问题?谢谢你的帮助!
以下是我的数据框和代码的子集:
var1 var2 choice prediction
-1.7 0 TRUE
3.5 0 TRUE
1.2 0 FALSE #empty#
6.7 0 FALSE
-0.6 1 TRUE
-2.8 1 FALSE
2.1 1 TRUE
def prediction(row):
if row['var1'] > row['var2']:
if row['choice'] == "TRUE": # "TRUE" and "FALSE" are bool.
return 'miss' # return values and add into the new column
elif row['choice'] == "FALSE":
return 'match'
elif row['var1'] < row['var2']:
if row['choice'] == "FALSE":
return 'miss'
elif row['choice'] == "TRUE":
return 'match'
else:
if row['choice'] == "TRUE":
return 'match'
elif row['choice'] == "FALSE":
return 'miss'
df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
答案 0 :(得分:1)
似乎存在问题,您的数据是布尔值,没有字符串TRUE
和FALSE
,因此内部条件永远不会返回True
:
np.random.seed(123)
N = 10
df = pd.DataFrame({'var1':np.random.randint(10, size=N),
'var2':np.random.randint(10, size=N),
'choice':np.random.choice([True, False], size=N)})
print (df)
choice var1 var2
0 False 2 9
1 False 2 0
2 True 6 0
3 True 1 9
4 False 3 3
5 True 9 4
6 True 6 0
7 False 1 0
8 True 0 4
9 False 1 1
print (df.dtypes)
choice bool
var1 int32
var2 int32
dtype: object
def prediction(row):
if row['var1'] > row['var2']:
if row['choice'] == "TRUE":
return 'miss' # return values and add into the new column
elif row['choice'] == "FALSE":
return 'match'
elif row['var1'] < row['var2']:
if row['choice'] == "FALSE":
return 'miss'
elif row['choice'] == "TRUE":
return 'match'
else:
if row['choice'] == "TRUE":
return 'match'
elif row['choice'] == "FALSE":
return 'miss'
df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
choice var1 var2 prediction
0 False 2 9 None
1 False 2 0 None
2 True 6 0 None
3 True 1 9 None
4 False 3 3 None
5 True 9 4 None
6 True 6 0 None
7 False 1 0 None
8 True 0 4 None
9 False 1 1 None
所以需要:
def prediction(row):
if row['var1'] > row['var2']:
if row['choice'] == True:
return 'miss' # return values and add into the new column
elif row['choice'] == False:
return 'match'
elif row['var1'] < row['var2']:
if row['choice'] == False:
return 'miss'
elif row['choice'] == True:
return 'match'
else:
if row['choice'] == True:
return 'match'
elif row['choice'] == False:
return 'miss'
df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
choice var1 var2 prediction
0 False 2 9 miss
1 False 2 0 match
2 True 6 0 miss
3 True 1 9 match
4 False 3 3 miss
5 True 9 4 miss
6 True 6 0 miss
7 False 1 0 match
8 True 0 4 match
9 False 1 1 miss
另一个可能的问题是,如果值不匹配 - 字符串'True'
与'TRUE'
相似,与False
类似:
np.random.seed(123)
N = 10
df = pd.DataFrame({'var1':np.random.randint(10, size=N),
'var2':np.random.randint(10, size=N),
'choice':np.random.choice(['True', 'False'], size=N)})
print (df)
choice var1 var2
0 False 2 9
1 False 2 0
2 True 6 0
3 True 1 9
4 False 3 3
5 True 9 4
6 True 6 0
7 False 1 0
8 True 0 4
9 False 1 1
def prediction(row):
if row['var1'] > row['var2']:
if row['choice'] == "TRUE":
return 'miss' # return values and add into the new column
elif row['choice'] == "FALSE":
return 'match'
elif row['var1'] < row['var2']:
if row['choice'] == "FALSE":
return 'miss'
elif row['choice'] == "TRUE":
return 'match'
else:
if row['choice'] == "TRUE":
return 'match'
elif row['choice'] == "FALSE":
return 'miss'
df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
choice var1 var2 prediction
0 False 2 9 None
1 False 2 0 None
2 True 6 0 None
3 True 1 9 None
4 False 3 3 None
5 True 9 4 None
6 True 6 0 None
7 False 1 0 None
8 True 0 4 None
9 False 1 1 None