根据条件删除/重新采样熊猫数据框行

时间:2020-10-19 23:24:32

标签: python pandas

我有一个熊猫DataFrame代表一些测量 第一列表示连续变量,其增量很小,为0.1或0.2 我需要重新采样此变量(以及整个DataFrame),使其每增加1

0     494.84284
1     494.86824
2     494.89364
3     494.91904
4     494.94444
5     494.96984
6     494.99524
7     495.02064
8     495.04604
9     495.07144
10    495.09684
11    495.12224
12    495.14764
13    495.17304
14    495.19844
15    495.22384
16    495.24924
17    495.27464
18    495.30004
19    495.32544
20    495.35084
21    495.37624
22    495.40164
23    495.42704
24    495.45244
25    495.47784
26    495.50324
27    495.52864
28    495.55404
29    495.57944

我试图将此列设置为索引,并成功运行下面的代码

row_init = 0.0
for index, row in df.iterrows(): 
    if (index - row_init) < 1:
        #print (index)
        df.drop(index, inplace=True)
        row_init = index
        #print (row_init)

Example output:
0     494.84284
1     495.02064
2     496.47784
3     497.50324
4     498.52864
5     499.55404
6     500.57944

1 个答案:

答案 0 :(得分:0)

您似乎只想要每个整数的第一个值,因此可以对整数值进行分组并取第一个!

df = pd.DataFrame({'data':[494.84284,494.86824,494.89364,494.91904,494.94444,494.96984,494.99524,495.02064,495.04604,495.07144,495.66072,496.01247,497.5000,497.9777,500.01354]})

df.groupby(df['data'].astype(int)).first().reset_index(drop=True)

输出

         data
0   494.84284
1   495.02064
2   496.01247
3   497.50000
4   500.01354