我有44行x 4列的数据。我想对每11行进行求和除法,但是在我的函数中,我的错误是我计算了整行中的求和与除法。
请建议我最简单的解决方案,也许在数据框中使用迭代?
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1,2,3,2,2,4,5,6,4,5,6,4,5,6,1,1,1,3,5,1,3,5,1,3,5,4,1,7,8,9,7,8,9,7,8,9,4,2],
'B':[4,5,6,4,5,6,4,5,6,1,1,1,3,5,1,3,5,1,3,5,4,1,4,5,6,1,1,1,3,5,1,3,6,3,9,7,8,9,4,2,7,8,9,2],
'C':[7,8,9,7,8,9,7,8,9,4,2,2,3,2,2,4,5,6,4,3,6,3,9,7,8,9,4,2,7,8,9,7,8,9,7,8,9,4,2,2,1,3,5,4],
'D':[1,3,5,1,3,5,1,3,5,4,1,7,8,9,7,8,9,7,8,9,4,2,7,8,9,7,8,9,7,8,9,4,2,2,3,2,2,4,5,6,4,3,6,3]}
)
a = data[['A','B','C','D']].sum()
b = data[['A','B','C','D']] / a
data_div = b.round(4)
这是我期望的一个例子。在下图中,我将A
列中的每4行相加并除以
答案 0 :(得分:2)
这看起来像您期望的那样:
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1,2,3,2,2,4,5,6,4,5,6,4,5,6,1,1,1,3,5,1,3,5,1,3,5,4,1,7,8,9,7,8,9,7,8,9,4,2],
'B':[4,5,6,4,5,6,4,5,6,1,1,1,3,5,1,3,5,1,3,5,4,1,4,5,6,1,1,1,3,5,1,3,6,3,9,7,8,9,4,2,7,8,9,2],
'C':[7,8,9,7,8,9,7,8,9,4,2,2,3,2,2,4,5,6,4,3,6,3,9,7,8,9,4,2,7,8,9,7,8,9,7,8,9,4,2,2,1,3,5,4],
'D':[1,3,5,1,3,5,1,3,5,4,1,7,8,9,7,8,9,7,8,9,4,2,7,8,9,7,8,9,7,8,9,4,2,2,3,2,2,4,5,6,4,3,6,3]}
)
chunk_len = 11
result = pd.DataFrame()
for i in range(4):
res = data[i*chunk_len:(i+1)*chunk_len]/data[i*chunk_len:(i+1)*chunk_len].sum()
if result.empty:
result = res
else:
result = result.append(res)
print(result)
答案 1 :(得分:0)
假设我正确理解了您的问题,那么您希望将数据帧汇总为11行。一种方法是:
result = data.iloc[0:11].sum().sum()
第一个.sum()返回前10行除以列的总和,第二个.sum()返回这些总和以得到总和。对于数据帧的不同切片,您可以通过放入所需的切片来更改行选择(例如data.iloc [11:23]等)。
完全相同的逻辑也适用于除法。
答案 2 :(得分:0)
您可以尝试按N
行进行分组,然后应用总和:
df.index = [i // 7 for i in range(len(df))]
df['sum_A'] = df["A"].groupby(df.index).sum()
df['div_A'] = df["A"] / df['sum_A']
完整代码:
df = pd.DataFrame({'A':[1,2,3,1,2,3,1,2,3,2,2,4,5,6,4,5,6,4,5,6,1,1,1,3,5,1,3,5,1,3,5,4,1,7,8,9,7,8,9,7,8,9,4,2],
'B':[4,5,6,4,5,6,4,5,6,1,1,1,3,5,1,3,5,1,3,5,4,1,4,5,6,1,1,1,3,5,1,3,6,3,9,7,8,9,4,2,7,8,9,2],
'C':[7,8,9,7,8,9,7,8,9,4,2,2,3,2,2,4,5,6,4,3,6,3,9,7,8,9,4,2,7,8,9,7,8,9,7,8,9,4,2,2,1,3,5,4],
'D':[1,3,5,1,3,5,1,3,5,4,1,7,8,9,7,8,9,7,8,9,4,2,7,8,9,7,8,9,7,8,9,4,2,2,3,2,2,4,5,6,4,3,6,3]}
)
df.index = [i // 11 for i in range(len(df))] # Define new index for groupby
df['sum_A'] = df["A"].groupby(df.index).sum() # Apply sum per group
df['div_A'] = df["A"] / df['sum_A'] # Divide each row by the given sum
print(df)
# A B C D sum_A div_A
# 0 1 4 7 1 22 0.045455
# 0 2 5 8 3 22 0.090909
# 0 3 6 9 5 22 0.136364
# 0 1 4 7 1 22 0.045455
# 0 2 5 8 3 22 0.090909
# 0 3 6 9 5 22 0.136364
# 0 1 4 7 1 22 0.045455
# 0 2 5 8 3 22 0.090909
# 0 3 6 9 5 22 0.136364
# 0 2 1 4 4 22 0.090909
# 0 2 1 2 1 22 0.090909
# 1 4 1 2 7 47 0.085106
# 1 5 3 3 8 47 0.106383
# 1 6 5 2 9 47 0.127660
# 1 4 1 2 7 47 0.085106
# 1 5 3 4 8 47 0.106383
# 1 6 5 5 9 47 0.127660
# 1 4 1 6 7 47 0.085106
# 1 5 3 4 8 47 0.106383
# 1 6 5 3 9 47 0.127660
# 1 1 4 6 4 47 0.021277
# 1 1 1 3 2 47 0.021277
# 2 1 4 9 7 32 0.031250
# 2 3 5 7 8 32 0.093750
# 2 5 6 8 9 32 0.156250
# 2 1 1 9 7 32 0.031250
# 2 3 1 4 8 32 0.093750
# 2 5 1 2 9 32 0.156250
# 2 1 3 7 7 32 0.031250
# 2 3 5 8 8 32 0.093750
# 2 5 1 9 9 32 0.156250
# 2 4 3 7 4 32 0.125000
# 2 1 6 8 2 32 0.031250
# 3 7 3 9 2 78 0.089744
# 3 8 9 7 3 78 0.102564
# 3 9 7 8 2 78 0.115385
# 3 7 8 9 2 78 0.089744
# 3 8 9 4 4 78 0.102564
# 3 9 4 2 5 78 0.115385
# 3 7 2 2 6 78 0.089744
# 3 8 7 1 4 78 0.102564
# 3 9 8 3 3 78 0.115385
# 3 4 9 5 6 78 0.051282
# 3 2 2 4 3 78 0.025641
希望有帮助!