所以基本上我想重新索引一个数据帧,保持重复的索引
考虑以下数据框
Index Block Size Check
6 25 Yes
6 32 No
9 18 Yes
12 17 No
15 23 Yes
15 11 Yes
15 15 Yes
我想得到以下输出
Index Block Size Check
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 25 Yes
6 32 No
7 0 0
8 0 0
9 18 Yes
10 0 0
11 0 0
12 17 No
13 0 0
14 0 0
15 23 Yes
15 11 Yes
15 15 Yes
尝试过
data_out = data_in.reindex(pd.RangeIndex(data_in.index.max()+1)).fillna(0)
发出错误
答案 0 :(得分:0)
尝试一下
import pandas as pd
df = pd.DataFrame({"id":[6,6,9,12,15,15,15],"block":[25,32,18,17,23,11,15],"check":["yes","no","yes","no","yes","yes","yes"]})
df = df.set_index("id")
inds = df.index.unique().values
al=[i for i in range(1,max(inds))]
newdf = pd.DataFrame({"id":list(set(al) - set(inds)),"block":0,"check":0})
newdf=newdf.set_index("id")
alldf = pd.concat([df,newdf]).sort_index()
输出
block check
id
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 25 yes
6 32 no
7 0 0
8 0 0
9 18 yes
10 0 0
11 0 0
12 17 no
13 0 0
14 0 0
15 15 yes
15 23 yes
15 11 yes
答案 1 :(得分:0)
这应该做到:
In [1098]: df
Out[1098]:
Index Block Size Check
0 6 25 Yes
1 6 32 No
2 9 18 Yes
3 12 17 No
4 15 23 Yes
5 15 11 Yes
6 15 15 Yes
如果没有索引,则必须在下一部分之前执行df = df.reset_index():
newdf = pd.DataFrame((df.reindex(list(range(0,15)))).index.union(s.Index)).rename({0:"Index"}, axis=1).set_index('Index').combine_first(df.set_index('Index'))
newdf = newdf[(~newdf.duplicated()) | (newdf['Block'].isnull())].fillna(0)
newdf.Block = newdf.Block.astype(int)
输出:
Out[1094]:
Block Size Check
Index
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 25 Yes
6 32 No
7 0 0
8 0 0
9 18 Yes
10 0 0
11 0 0
12 17 No
13 0 0
14 0 0
15 23 Yes
15 11 Yes
15 15 Yes