在具有重复值的索引中填充缺失值

时间:2019-12-05 02:22:35

标签: python pandas dataframe

所以基本上我想重新索引一个数据帧,保持重复的索引

考虑以下数据框

Index Block Size Check

6       25        Yes
6       32        No
9       18        Yes
12      17        No
15      23        Yes
15      11        Yes
15      15        Yes

我想得到以下输出

Index Block Size Check

1        0         0
2        0         0
3        0         0
4        0         0
5        0         0
6       25        Yes
6       32        No
7        0         0
8        0         0
9       18        Yes
10       0         0
11       0         0
12      17        No
13       0         0
14       0         0
15      23        Yes
15      11        Yes
15      15        Yes

尝试过 data_out = data_in.reindex(pd.RangeIndex(data_in.index.max()+1)).fillna(0) 发出错误

2 个答案:

答案 0 :(得分:0)

尝试一下

import pandas as pd

df = pd.DataFrame({"id":[6,6,9,12,15,15,15],"block":[25,32,18,17,23,11,15],"check":["yes","no","yes","no","yes","yes","yes"]})
df = df.set_index("id")

inds = df.index.unique().values
al=[i for i in range(1,max(inds))]
newdf = pd.DataFrame({"id":list(set(al) - set(inds)),"block":0,"check":0})
newdf=newdf.set_index("id")

alldf = pd.concat([df,newdf]).sort_index()

输出

    block check
id             
1       0     0
2       0     0
3       0     0
4       0     0
5       0     0
6      25   yes
6      32    no
7       0     0
8       0     0
9      18   yes
10      0     0
11      0     0
12     17    no
13      0     0
14      0     0
15     15   yes
15     23   yes
15     11   yes

答案 1 :(得分:0)

这应该做到:

In [1098]: df                                                                                                                                                                                              
Out[1098]: 
   Index  Block Size Check
0      6     25        Yes
1      6     32         No
2      9     18        Yes
3     12     17         No
4     15     23        Yes
5     15     11        Yes
6     15     15        Yes

如果没有索引,则必须在下一部分之前执行df = df.reset_index():

newdf = pd.DataFrame((df.reindex(list(range(0,15)))).index.union(s.Index)).rename({0:"Index"}, axis=1).set_index('Index').combine_first(df.set_index('Index')) 
newdf = newdf[(~newdf.duplicated()) | (newdf['Block'].isnull())].fillna(0)
newdf.Block = newdf.Block.astype(int) 

输出:

Out[1094]: 
       Block Size Check
Index                  
0          0          0
1          0          0
2          0          0
3          0          0
4          0          0
5          0          0
6         25        Yes
6         32         No
7          0          0
8          0          0
9         18        Yes
10         0          0
11         0          0
12        17         No
13         0          0
14         0          0
15        23        Yes
15        11        Yes
15        15        Yes