遍历熊猫数据框,一次选择n个行和列

时间:2019-06-24 17:20:14

标签: python pandas

所以我有一个数据集,如下所示:

# Example
     0  1     2   3  4   5
0   18  1   -19 -16 -5  19
1   18  0   -19 -17 -6  19
2   17  -1  -20 -17 -6  19
3   18  1   -19 -16 -5  20
4   18  0   -19 -16 -5  20

实际数据:

[{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
 {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
 {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
 {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
 {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
 {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]

以上内容的形状为:(20, 6)

我想要实现的是一次将自定义函数应用于4行上的每一列。

示例:

  1. 第一次迭代-> f()适用于所有列的 df.ix[0:3]
  2. 第二次迭代-> f()适用于所有列的 df.ix[4:7]

以此类推...

某种程度上,我需要滚动4步长的4号窗口。

使用上述数据时,

结果将是以下形状的数据框:(5, 6)。仅出于论证的目的,您可以假定自定义函数将每一列取这4行的平均值。

到目前为止我尝试了什么?

  1. 我研究了滚动,但是滚动并没有做我需要做的事情。它滚动一个步幅为1的窗口。
  2. 在实际实现它方面有很多尝试,但是由于数据量大,我确实需要对其进行优化:

代码如下:

curr = 0
res = []
while curr < df_to_look_at2.shape[0]:
    look_at = df_to_look_at2.ix[curr:curr+3]
    curr += 4
    res.append(look_at.mean().values.tolist())
pd.DataFrame(res)

和结果:

       0       1         2       3      4      5
0   17.75   0.25    -19.25  -16.50  -5.50   19.25
1   18.25   0.25    -19.00  -16.00  -5.25   19.50
2   17.75   0.25    -19.25  -16.75  -5.75   19.00
3   17.75   0.25    -19.00  -16.00  -4.75   19.75
4   17.75   0.25    -18.75  -14.75  -3.75   21.00

还有一个想法,如果它不仅要取均值,还要取min(),max(),mean()和其他一些自定义函数...

2 个答案:

答案 0 :(得分:1)

如果您要在一个以上的窗口中考虑多个行,则滚动在此处是准确的。但是,您的窗户是唯一的,所以您真正要问的是如何按照步幅分组,您可以使用def drawGrid(): for x in range(0, WINDOWWIDTH, CELLSIZE): pygame.draw.line(DISPLAYSURF, DARKGRAY, (x, 0) (x, WINDOWHEIGHT)) for y in range(0, WINDOWHEIGHT, CELLSIZE): pygame.draw.line(DISPLAYSURF, DARKGRAY, (0, y) (WINDOWWIDTH, y)) 和楼层划分来完成。

arange

window_size = 4
grouper = np.arange(df.shape[0]) // window_size

df.groupby(grouper).mean()

答案 1 :(得分:1)

我认为以这种方式进行的多次计算实际上属于numpy草皮。您可以使用整形来获得所需格式的基础数组,然后根据需要在数组上进行计算。

inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
 {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
 {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
 {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
 {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
 {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]

import pandas as pd
df = pd.DataFrame(inp)

temp = df.values.reshape(-1, 4, df.shape[-1])

out = pd.DataFrame(temp.mean(axis=1))

输出:

       0     1      2      3     4      5
0  17.75  0.25 -19.25 -16.50 -5.50  19.25
1  18.25  0.25 -19.00 -16.00 -5.25  19.50
2  17.75  0.25 -19.25 -16.75 -5.75  19.00
3  17.75  0.25 -19.00 -16.00 -4.75  19.75
4  17.75  0.25 -18.75 -14.75 -3.75  21.00