熊猫:每n行重复n次值

时间:2018-12-11 15:55:12

标签: python pandas

我有一个2563199行的数据框。看起来像:

          index    dtm        f
      0     0   00:00:00    50.065
      1     1   00:00:01    50.061
      2     2   00:00:02    50.058
      3     3   00:00:03    50.049
      4     4   00:00:04    50.044
      5     5   00:00:05    50.044
      6     6   00:00:06    50.042
      7     7   00:00:07    50.042
    ....................
   2591997  2591997 23:59:57    50.009
   2591998  2591998 23:59:58    50.008
   2591999  2591999 23:59:59    50.006

我想创建一个新列,该列每n行重复n次包含的值。例如,如果我设置为在第4行中重复该值,它将在前4行中重复50.049,在随后的4行中重复50.042,依此类推。 (如果数据帧的长度不匹配,则精确的除法无关紧要)。如下所示:

          index   dtm         f
      0     0   00:00:00    50.049
      1     1   00:00:01    50.049
      2     2   00:00:02    50.049          
      3     3   00:00:03    50.049
      4     4   00:00:04    50.042
      5     5   00:00:05    50.042
      6     6   00:00:06    50.042
      7     7   00:00:07    50.042

我尝试每86400行:

arr = np.arange(len(df)) // 86400
for x in arr:
    df['value']=df['f'].iloc[x+86400]

有什么主意吗?谢谢!

2 个答案:

答案 0 :(得分:3)

这是一种避免在df上循环的方法。

首先设置一个n,并生成一个列表,其中包含现有索引,但不包括将用于重复f中的值的行:

n=4
ix = [x for i, x in enumerate(df.index.values) if (i + 1) % n != 0]
print(ix)
[0, 1, 2, 4, 5, 6]

现在将这些值设置为np.nan并使用bfill

df.loc[ix, 'f'] = np.nan
df['f'] = df.f.bfill()

print(df)
    index       dtm       f
0      0  00:00:00  50.049
1      1  00:00:01  50.049
2      2  00:00:02  50.049
3      3  00:00:03  50.049
4      4  00:00:04  50.042
5      5  00:00:05  50.042
6      6  00:00:06  50.042
7      7  00:00:07  50.042

答案 1 :(得分:1)

使用numpy和数组切片

import numpy as np

n = 4
df['fnew'] = np.concatenate([np.repeat(df.f.values[n-1::n], n),
                             np.repeat(np.NaN, len(df)%n)])

输出:

n=3
   index       dtm       f    fnew
0      0  00:00:00  50.065  50.058
1      1  00:00:01  50.061  50.058
2      2  00:00:02  50.058  50.058
3      3  00:00:03  50.049  50.044
4      4  00:00:04  50.044  50.044
5      5  00:00:05  50.044  50.044
6      6  00:00:06  50.042     NaN
7      7  00:00:07  50.042     NaN

n = 4
   index       dtm       f    fnew
0      0  00:00:00  50.065  50.049
1      1  00:00:01  50.061  50.049
2      2  00:00:02  50.058  50.049
3      3  00:00:03  50.049  50.049
4      4  00:00:04  50.044  50.042
5      5  00:00:05  50.044  50.042
6      6  00:00:06  50.042  50.042
7      7  00:00:07  50.042  50.042

n = 5
   index       dtm       f    fnew
0      0  00:00:00  50.065  50.044
1      1  00:00:01  50.061  50.044
2      2  00:00:02  50.058  50.044
3      3  00:00:03  50.049  50.044
4      4  00:00:04  50.044  50.044
5      5  00:00:05  50.044     NaN
6      6  00:00:06  50.042     NaN
7      7  00:00:07  50.042     NaN