Question

我想在df中添加一列。这个新df的值将取决于其他列的值。例如

dc = {'A':[0,9,4,5],'B':[6,0,10,12],'C':[1,3,15,18]}
df = pd.DataFrame(dc)
   A   B   C
0  0   6   1
1  9   0   3
2  4  10  15
3  5  12  18

现在我想添加另一列D，其值取决于A，B，C的值。因此，例如，如果迭代通过df，我会这样做：

for row in df.iterrows():
    if(row['A'] != 0 and row[B] !=0):
         row['D'] = (float(row['A'])/float(row['B']))*row['C']
    elif(row['C'] ==0 and row['A'] != 0 and row[B] ==0):
         row['D'] == 250.0
    else:
         row['D'] == 20.0

有没有办法在没有for循环或使用where（）或apply（）函数的情况下执行此操作。

由于

Answer 1

apply应该适合您：

In [20]: def func(row):
            if (row == 0).all():
                return 250.0
            elif (row[['A', 'B']] != 0).all():
                return (float(row['A']) / row['B'] ) * row['C']
            else:
                return 20
       ....:     


In [21]: df['D'] = df.apply(func, axis=1)

In [22]: df
Out[22]: 
   A   B   C     D
0  0   6   1  20.0
1  9   0   3  20.0
2  4  10  15   6.0
3  5  12  18   7.5

[4 rows x 4 columns]

Answer 2

.where可能比.apply快得多，所以如果您所做的只是/ elses，那么我的目标是.where。由于您在某些情况下返回标量，np.where将比熊猫更容易使用＆＃39;拥有.where。

import pandas as pd
import numpy as np
df['D'] = np.where((df.A!=0) & (df.B!=0), ((df.A/df.B)*df.C),
          np.where((df.C==0) & (df.A!=0) & (df.B==0), 250,
          20))

   A   B   C     D
0  0   6   1  20.0
1  9   0   3  20.0
2  4  10  15   6.0
3  5  12  18   7.5

对于像这样的小df，你不必担心速度。但是，在randn的10000行df上，这比上面的.apply解决方案快了近2000倍：3ms vs 5850ms。也就是说，如果速度不是一个问题，那么。应用往往更容易阅读。

Answer 3

这是一个开始：

df['D'] = np.nan
df['D'].loc[df[(df.A != 0) & (df.B != 0)].index] = df.A / df.B.astype(np.float) * df.C

编辑，你可能应该继续将整个事情转移到浮点数，除非你真的因为某些原因关心整数：

df = df.astype(np.float)

然后你不必经常在呼叫中继续转换

如果没有迭代数据框，则python pandas是数据帧

3 个答案: