Question

我正在尝试将函数应用于数据框，其中参数来自数据框本身。有一种简洁的方法吗？

df: 
    | a  | b  | c  | d |
A   | 20 | 15 | 33 | 5 |
B   | 5  | 6  | 10 | 8 |
C   | 10 | 15 | 5  | 10|

应用于每个单元格的功能

# c = sum of the current column
# r = sum of the current row 
# t = sum of all values
def calcIndex(x, c, r, t):
    return (x/c)*(t/r)*100

结果

    | a   | b   | c   | d   |
A   | 111 | 81  | 134 | 42  |
B   | 70  | 82  | 102 | 170 |
C   | 101 | 148 | 37  | 154 |

我尝试过df.apply，但不确定如何根据要计算的x访问特定的行/列总数

Answer 1

这里的DataFrame.apply问题可能是按列或按索引而不是按两者循环，因此，如果在一个函数中需要两者，则不能在此处使用。

更好，更快的方法是将向量化函数与DataFrame.div，DataFrame.mul和DataFrame.sum结合使用，最后将DataFrame.round与DataFrame.astype结合使用以输出整数：

c = df.sum(axis=1)
r = df.sum()
t = r.sum()
df1 = df.div(c, axis=0).mul(t).div(r).mul(100).round().astype(int)
print (df1)
     a    b    c    d
A  111   81  134   42
B   70   82  102  170
C  101  148   37  154

要提高性能，可以使用numpy：

#pandas 0.24+
arr = df.to_numpy()
#pandas below
#arr = df.values
c = arr.sum(axis=1)
r = arr.sum(axis=0)
t = r.sum()
out = np.round(arr / c[:, None] * t / r * 100).astype(int)
df = pd.DataFrame(out, index=df.index, columns=df.columns)
print (df)
     a    b    c    d
A  111   81  134   42
B   70   82  102  170
C  101  148   37  154

Answer 2

这是一个棘手的问题。

data = pd.DataFrame({'a':[20, 5, 10], 'b':[15, 6, 15], 'c':[33, 10, 5], 'd':[5, 8, 10]}, index=['A', 'B', 'C'])

total = data.values.sum() # total sum

data['row_sum'] = data.sum(axis=1) # create a new column 'row_sum' containing sum of elements in that row
col_sum = data.sum(axis=0) # column sum

data = data.loc[:,'a':'d'].div(data['row_sum'], axis=0) # divide each cell with its row sum
data.loc['col_sum'] = col_sum # create a new row with corresponding column sum
data = data.loc['A':'C',:].div(data.loc['col_sum'], axis=1) # divide each cell with its column sum

def update(x):
    return int(round(x*total*100)) # round number to nearest integer       

data_new = data.applymap(update)

输出：

     a    b    c    d
A  111   81  134   42
B   70   82  102  170
C  101  148   37  154

将带有参数的函数应用于dataFrame

2 个答案: