我有一列称为“ factor”的列,每当该列中的名称更改时,我想插入一个空白行,这可能吗?
for i in range(0, end):
if df2.at[i + 1, 'factor'] != df2.at[i, 'factor']:
答案 0 :(得分:2)
在for
循环中手动顺序插入行效率不高。或者,您可以找到发生更改的索引,构造一个新的数据框,连接,然后按索引排序:
df = pd.DataFrame([[1, 1], [2, 1], [3, 2], [4, 2],
[5, 2], [6, 3]], columns=['A', 'B'])
switches = df['B'].ne(df['B'].shift(-1))
idx = switches[switches].index
df_new = pd.DataFrame(index=idx + 0.5)
df = pd.concat([df, df_new]).sort_index()
print(df)
A B
0.0 1.0 1.0
1.0 2.0 1.0
1.5 NaN NaN
2.0 3.0 2.0
3.0 4.0 2.0
4.0 5.0 2.0
4.5 NaN NaN
5.0 6.0 3.0
5.5 NaN NaN
如有必要,可以使用reset_index
来规范索引:
print(df.reset_index(drop=True))
A B
0 1.0 1.0
1 2.0 1.0
2 NaN NaN
3 3.0 2.0
4 4.0 2.0
5 5.0 2.0
6 NaN NaN
7 6.0 3.0
8 NaN NaN
答案 1 :(得分:1)
使用Float64Index
边的indices
边的reindex
加上原始索引的union
,添加到0.5
中。
df2 = pd.DataFrame({'factor':list('aaabbccdd')})
idx = df2.index.union(df2.index[df2['factor'].shift(-1).ne(df2['factor'])] + .5)[:-1]
print (idx)
Float64Index([0.0, 1.0, 2.0, 2.5, 3.0, 4.0, 4.5, 5.0, 6.0, 6.5, 7.0, 8.0], dtype='float64')
df2 = df2.reindex(idx, fill_value='').reset_index(drop=True)
print (df2)
factor
0 a
1 a
2 a
3
4 b
5 b
6
7 c
8 c
9
10 d
11 d
如果要缺少值:
df2 = df2.reindex(idx).reset_index(drop=True)
print (df2)
factor
0 a
1 a
2 a
3 NaN
4 b
5 b
6 NaN
7 c
8 c
9 NaN
10 d
11 d