大熊猫数据帧中间隔位置的总和计算

时间:2021-05-31 10:58:48

标签: python pandas dataframe numpy

我有一个带间隔的熊猫数据框(由开始和停止定义):

df = pd.DataFrame(
    {
        'start': [1,1,1,2,2,2,2,3,3,3,3,3,3,3],
        'stop': [9,9,10,10,10,11,11,11,11,12,11,12,11,11],
        'percent' : [0.51,0.29,0.92,0.60,0.10,0.12,0.60,0.30,0.10,0.42,0.51,0.51,0.51,0.10],
        'order':[3,80,3,3,4,8,89,2,3,4,5,64,82,68]
   
    }
    )

看起来像:

start   stop    percent order
1   9   0.51    3
1   9   0.29    80
1   10  0.92    3
2   10  0.60    3
2   10  0.10    4
2   11  0.12    8
2   11  0.60    89
3   11  0.30    2
3   11  0.10    3
3   12  0.42    4
3   11  0.51    5
3   12  0.51    64
3   11  0.51    82
3   11  0.10    68

我想计算,对于每个位置(通过分割区间):总计数、价值总和、订单总和

注意:示例中的原始数据框未按坐标排序。

我想得到一个数据框:

pos count   sum_percent sum_order
1   3   1.72    86
2   7   3.14    190
3   14  5.59    418
4   14  5.59    418
5   14  5.59    418
6   14  5.59    418
7   14  5.59    418
8   14  5.59    418
9   14  5.59    418
10  12  4.79    335
11  9   3.17    325
12  2   0.93    68

我设法得到了我想要的计数列的结果:

max_pos=df[['start', 'stop']].values.max()
pos_range=np.arange(1, max_pos+1)
counts = ((df[['start']].values <= pos_range) & (pos_range <= df[['stop']].values)).sum(axis=0)
o = pd.DataFrame({'pos': pos_range, "counts": counts})

但是对于列的总和,我没能做到。 有什么帮助吗? 提前致谢

1 个答案:

答案 0 :(得分:0)

使用用于计数的布尔变量作为索引:

import numpy as np
import pandas as pd
names=["start","stop","percent","order"]
vals=np.array([
  [1,9,0.51, 3],
  [1,9,0.29,80],
  [1,10,0.92, 3],
  [2,10,0.60, 3],
  [2,10,0.10, 4],
  [2,11,0.12, 8],
  [2,11,0.60,89],
  [3,11,0.30, 2],
  [3,11,0.10, 3],
  [3,12,0.42, 4],
  [3,11,0.51, 5],
  [3,12,0.51,64],
  [3,11,0.51,82],
  [3,11,0.10,68]
  ])

df = pd.DataFrame(vals, columns=names)
df
max_pos=df[['start', 'stop']].values.max()
pos_range=np.arange(1, max_pos+1)

_ix = ((df[['start']].values <= pos_range) & (pos_range <= df[['stop']].values))
counts = _ix.sum(axis=0)
sum_percent=[]
for i in _ix.T:
    sum_percent.append(df["percent"].values[i].sum())
sum_order = []
for i in _ix.T:
    sum_order.append(df["order"].values[i].sum())

o = pd.DataFrame({'pos': pos_range, "counts": counts, "sum_percent":sum_percent, "sum_order":sum_order})