尝试使用for循环填充数据框中的新列

时间:2016-11-16 15:57:24

标签: python pandas

根据另一列的值,我想用for循环填写一个新列。遗憾的是没有得到我需要的结果;

profit = []

# For each row in the column,
for row in df3['Result']:
    # if value is;
    if row == 'H':
        # Append a Profit/Loss
        profit.append(df3['column value H'])
    # else, if value is,
    elif row == 'D':
        # Append a Profit/Loss
        profit.append(df3['column value D'])        
    # otherwise,
    else:
        # Append a Profit/Loss
        profit.append(df3['column value A'])

df3['profit'] = profit

2 个答案:

答案 0 :(得分:2)

我认为你需要加倍numpy.where

df3['profit'] = np.where(df3['Result'] == 'H', df3['column value H'], 
                np.where(df3['Result'] == 'D', df3['column value D'], df3['column value A']))

样品:

df3 = pd.DataFrame({'Result':['H','D','E'],
                   'column value H':[4,5,6],
                   'column value D':[7,8,9],
                   'column value A':[1,3,5]})

print (df3)
  Result  column value A  column value D  column value H
0      H               1               7               4
1      D               3               8               5
2      E               5               9               6

df3['profit'] = np.where(df3['Result'] == 'H', df3['column value H'], 
                np.where(df3['Result'] == 'D', df3['column value D'], df3['column value A']))

print (df3)
  Result  column value A  column value D  column value H  profit
0      H               1               7               4       4
1      D               3               8               5       8
2      E               5               9               6       5

<强>计时

In [198]: %timeit (jez(df3))
100 loops, best of 3: 7.59 ms per loop

In [199]: %timeit (wwii(df4))
1 loop, best of 3: 1.49 s per loop

In [200]: %timeit (wwii1(df5))
1 loop, best of 3: 4.48 s per loop

测试代码

df3 = pd.DataFrame({'Result':['H','D','E'],
                   'column value H':[4,5,6],
                   'column value D':[7,8,9],
                   'column value A':[1,3,5]})

print (df3)
df3 = pd.concat([df3]*10000).reset_index(drop=True)

df4 = df3.copy()
df5 = df3.copy()

def jez(df3):
    df3['profit'] = np.where(df3['Result'] == 'H', df3['column value H'], 
                    np.where(df3['Result'] == 'D', df3['column value D'], df3['column value A']))

    return (df3)

def foo(series):
    # d maps Result column values to DataFrame/Series column names
    d = {'H':'column value H', 'D':'column value D'}
    try:
        return series[d[series['Result']]]
    except KeyError as e:
        return series['column value A']

def wwii(df3):
    df3['Profit'] = df3.apply(foo, axis = 1)
    return df3

def wwii1(df3):
    profit = []
    for row in df3.iterrows():
        series = row[1]
        if series.Result == 'H':
            # Append a Profit/Loss
            profit.append(series['column value H'])
        # else, if value is,
        elif series.Result == 'D':
            # Append a Profit/Loss
            profit.append(series['column value D'])
        # otherwise,
        else:
            # Append a Profit/Loss
            profit.append(series['column value A'])

    df3['profit'] = profit        
    return df3            

print (jez(df3))    
print (wwii(df4))    
print (wwii1(df5))    

答案 1 :(得分:0)

您未在操作中使用任何行信息。您可能已经注意到,df3['column value H']会为您要操作的行返回一个系列而不是一个单值。

修复for循环使用DataFrame.iterrows(),返回每行的(索引,系列)元组。然后,您可以使用series['column name']访问该行中的每一列。

for row in df3.iterrows():
    series = row[1]
    if series.Result == 'H':
        # Append a Profit/Loss
        profit.append(series['column value H'])
    # else, if value is,
    elif series.Result == 'D':
        # Append a Profit/Loss
        profit.append(series['column value D'])        
    # otherwise,
    else:
        # Append a Profit/Loss
        profit.append(series['column value A'])

另一种选择是编写一个函数,该函数将一系列作为参数,处理它并返回所需的值。然后使用DataFrame.apply() - 指定axis = 1将该函数应用于行。

def foo(series):
    # d maps Result column values to DataFrame/Series column names
    d = {'H':'column value H', 'D':'column value D'}
    try:
        return series[d[series['Result']]]
    except KeyError as e:
        return series['column value A']

df3['Profit'] = df3.apply(foo, axis = 1)

double where proposed by @jezrael可能是最好的,如果DataFrame不比假定更复杂(没有给出示例),但如果有更多可能的列或更多条件,它可能会变得混乱。