Question

我试图在我的数据框中添加一个额外的列。我根据其他变量if/else和var1使用var2条件，并尝试将return值添加到此新列prediction中。通过使用以下代码，我可以创建一个列名，但由于某些我无法弄清楚的新列是EMPTY。没有错误，所以我假设return出现了问题？谢谢你的帮助！

以下是我的数据框和代码的子集：

var1    var2    choice    prediction
-1.7     0       TRUE     
3.5      0       TRUE
1.2      0       FALSE      #empty#   
6.7      0       FALSE
-0.6     1       TRUE
-2.8     1       FALSE
2.1      1       TRUE

def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == "TRUE": # "TRUE" and "FALSE" are bool.
            return 'miss'   # return values and add into the new column
        elif row['choice'] == "FALSE":
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == "FALSE":
            return 'miss'
        elif row['choice'] == "TRUE":
            return 'match'

    else:
        if row['choice'] == "TRUE":
            return 'match'
        elif row['choice'] == "FALSE":
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)

Answer 1

似乎存在问题，您的数据是布尔值，没有字符串TRUE和FALSE，因此内部条件永远不会返回True：

np.random.seed(123)
N = 10
df = pd.DataFrame({'var1':np.random.randint(10, size=N),
                   'var2':np.random.randint(10, size=N), 
                   'choice':np.random.choice([True, False], size=N)})
print (df)
   choice  var1  var2
0   False     2     9
1   False     2     0
2    True     6     0
3    True     1     9
4   False     3     3
5    True     9     4
6    True     6     0
7   False     1     0
8    True     0     4
9   False     1     1

print (df.dtypes)
choice     bool
var1      int32
var2      int32
dtype: object

def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == "TRUE":
            return 'miss'   # return values and add into the new column
        elif row['choice'] == "FALSE":
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == "FALSE":
            return 'miss'
        elif row['choice'] == "TRUE":
            return 'match'

    else:
        if row['choice'] == "TRUE":
            return 'match'
        elif row['choice'] == "FALSE":
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
   choice  var1  var2 prediction
0   False     2     9       None
1   False     2     0       None
2    True     6     0       None
3    True     1     9       None
4   False     3     3       None
5    True     9     4       None
6    True     6     0       None
7   False     1     0       None
8    True     0     4       None
9   False     1     1       None

所以需要：

def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == True:
            return 'miss'   # return values and add into the new column
        elif row['choice'] == False:
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == False:
            return 'miss'
        elif row['choice'] == True:
            return 'match'

    else:
        if row['choice'] == True:
            return 'match'
        elif row['choice'] == False:
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
   choice  var1  var2 prediction
0   False     2     9       miss
1   False     2     0      match
2    True     6     0       miss
3    True     1     9      match
4   False     3     3       miss
5    True     9     4       miss
6    True     6     0       miss
7   False     1     0      match
8    True     0     4      match
9   False     1     1       miss

另一个可能的问题是，如果值不匹配 - 字符串'True'与'TRUE'相似，与False类似：

np.random.seed(123)
N = 10
df = pd.DataFrame({'var1':np.random.randint(10, size=N),
                   'var2':np.random.randint(10, size=N), 
                   'choice':np.random.choice(['True', 'False'], size=N)})
print (df)
  choice  var1  var2
0  False     2     9
1  False     2     0
2   True     6     0
3   True     1     9
4  False     3     3
5   True     9     4
6   True     6     0
7  False     1     0
8   True     0     4
9  False     1     1

def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == "TRUE":
            return 'miss'   # return values and add into the new column
        elif row['choice'] == "FALSE":
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == "FALSE":
            return 'miss'
        elif row['choice'] == "TRUE":
            return 'match'

    else:
        if row['choice'] == "TRUE":
            return 'match'
        elif row['choice'] == "FALSE":
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
  choice  var1  var2 prediction
0  False     2     9       None
1  False     2     0       None
2   True     6     0       None
3   True     1     9       None
4  False     3     3       None
5   True     9     4       None
6   True     6     0       None
7  False     1     0       None
8   True     0     4       None
9  False     1     1       None

无法返回数据框中新列的值

1 个答案: