无法返回数据框中新列的值

时间:2017-06-23 05:48:26

标签: python pandas dataframe return

我试图在我的数据框中添加一个额外的列。我根据其他变量if/elsevar1使用var2条件,并尝试将return值添加到此新列prediction中。通过使用以下代码,我可以创建一个列名,但由于某些我无法弄清楚的新列是EMPTY。没有错误,所以我假设return出现了问题?谢谢你的帮助!

以下是我的数据框和代码的子集:

var1    var2    choice    prediction
-1.7     0       TRUE     
3.5      0       TRUE
1.2      0       FALSE      #empty#   
6.7      0       FALSE
-0.6     1       TRUE
-2.8     1       FALSE
2.1      1       TRUE

def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == "TRUE": # "TRUE" and "FALSE" are bool.
            return 'miss'   # return values and add into the new column
        elif row['choice'] == "FALSE":
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == "FALSE":
            return 'miss'
        elif row['choice'] == "TRUE":
            return 'match'

    else:
        if row['choice'] == "TRUE":
            return 'match'
        elif row['choice'] == "FALSE":
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)

1 个答案:

答案 0 :(得分:1)

似乎存在问题,您的数据是布尔值,没有字符串TRUEFALSE,因此内部条件永远不会返回True

np.random.seed(123)
N = 10
df = pd.DataFrame({'var1':np.random.randint(10, size=N),
                   'var2':np.random.randint(10, size=N), 
                   'choice':np.random.choice([True, False], size=N)})
print (df)
   choice  var1  var2
0   False     2     9
1   False     2     0
2    True     6     0
3    True     1     9
4   False     3     3
5    True     9     4
6    True     6     0
7   False     1     0
8    True     0     4
9   False     1     1

print (df.dtypes)
choice     bool
var1      int32
var2      int32
dtype: object
def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == "TRUE":
            return 'miss'   # return values and add into the new column
        elif row['choice'] == "FALSE":
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == "FALSE":
            return 'miss'
        elif row['choice'] == "TRUE":
            return 'match'

    else:
        if row['choice'] == "TRUE":
            return 'match'
        elif row['choice'] == "FALSE":
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
   choice  var1  var2 prediction
0   False     2     9       None
1   False     2     0       None
2    True     6     0       None
3    True     1     9       None
4   False     3     3       None
5    True     9     4       None
6    True     6     0       None
7   False     1     0       None
8    True     0     4       None
9   False     1     1       None

所以需要:

def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == True:
            return 'miss'   # return values and add into the new column
        elif row['choice'] == False:
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == False:
            return 'miss'
        elif row['choice'] == True:
            return 'match'

    else:
        if row['choice'] == True:
            return 'match'
        elif row['choice'] == False:
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
   choice  var1  var2 prediction
0   False     2     9       miss
1   False     2     0      match
2    True     6     0       miss
3    True     1     9      match
4   False     3     3       miss
5    True     9     4       miss
6    True     6     0       miss
7   False     1     0      match
8    True     0     4      match
9   False     1     1       miss

另一个可能的问题是,如果值不匹配 - 字符串'True''TRUE'相似,与False类似:

np.random.seed(123)
N = 10
df = pd.DataFrame({'var1':np.random.randint(10, size=N),
                   'var2':np.random.randint(10, size=N), 
                   'choice':np.random.choice(['True', 'False'], size=N)})
print (df)
  choice  var1  var2
0  False     2     9
1  False     2     0
2   True     6     0
3   True     1     9
4  False     3     3
5   True     9     4
6   True     6     0
7  False     1     0
8   True     0     4
9  False     1     1
def prediction(row):
    if row['var1'] > row['var2']:
        if row['choice'] == "TRUE":
            return 'miss'   # return values and add into the new column
        elif row['choice'] == "FALSE":
            return 'match'

    elif row['var1'] < row['var2']:
        if row['choice'] == "FALSE":
            return 'miss'
        elif row['choice'] == "TRUE":
            return 'match'

    else:
        if row['choice'] == "TRUE":
            return 'match'
        elif row['choice'] == "FALSE":
            return 'miss' 

df['prediction'] = df.apply(lambda row: prediction(row), axis=1)
print (df)
  choice  var1  var2 prediction
0  False     2     9       None
1  False     2     0       None
2   True     6     0       None
3   True     1     9       None
4  False     3     3       None
5   True     9     4       None
6   True     6     0       None
7  False     1     0       None
8   True     0     4       None
9  False     1     1       None