我有一个如下所示的数据框:
x frames
0 7729.00
0 7730.00
0 7731.00
1 7735.00
1 7736.00
1 7737.00
1 7738.00
2 7741.00
2 7742.00
如您所见,frames
的值是连续的,但是当 x
发生变化时,frames
会发生跳跃。我想继续 frames
以便它总是增加 1,在这种情况下,使 x
nan。像这样:
x frames
0 7729.00
0 7730.00
0 7731.00
Nan 7732.00
Nan 7733.00
Nan 7734.00
1 7735.00
1 7736.00
1 7737.00
1 7738.00
Nan 7739.00
Nan 7740.00
2 7741.00
2 7742.00
这是我使用第一个解决方案时遇到的错误。
df = df.set_index('frames').reindex(range(s.min(), s.max() + 1)).reset_index()
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 227, in wrapper
return func(*args, **kwargs)
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3856, in reindex
return super().reindex(**kwargs)
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4544, in reindex
axes, level, limit, tolerance, method, fill_value, copy
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3744, in _reindex_axes
index, method, copy, level, fill_value, limit, tolerance
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3766, in _reindex_index
allow_dups=False,
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4613, in _reindex_with_indexers
copy=copy,
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1251, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3099, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
如果我在数据框中还有另一列,如下所示:
x y frames
0 yes 7729.00
0 yes 7730.00
0 yes 7731.00
1 no 7735.00
1 no 7736.00
1 no 7737.00
1 no 7738.00
2 yes 7741.00
2 yes 7742.00
然后解决方案将所有其他列(x
和 y
)转换为 NaN。
答案 0 :(得分:2)
通过最小值和最大值使用 DataFrame.reindex
和 range
:
s = df['frames'].astype(int)
df = df.set_index('frames').reindex(range(s.min(), s.max() + 1)).reset_index()
print (df)
frames x
0 7729 0.0
1 7730 0.0
2 7731 0.0
3 7732 NaN
4 7733 NaN
5 7734 NaN
6 7735 1.0
7 7736 1.0
8 7737 1.0
9 7738 1.0
10 7739 NaN
11 7740 NaN
12 7741 2.0
13 7742 2.0
或者在带有辅助 DataFrame 的 DataFrame.merge
中使用右连接:
s = df['frames'].astype(int)
df = df.merge(pd.DataFrame({'frames': range(s.min(), s.max() + 1)}), how='right')
print (df)
x frames
0 0.0 7729.0
1 0.0 7730.0
2 0.0 7731.0
3 NaN 7732.0
4 NaN 7733.0
5 NaN 7734.0
6 1.0 7735.0
7 1.0 7736.0
8 1.0 7737.0
9 1.0 7738.0
10 NaN 7739.0
11 NaN 7740.0
12 2.0 7741.0
13 2.0 7742.0
答案 1 :(得分:0)
您可以使用 complete 中的 pyjanitor 函数来公开显式缺失的值。在这种情况下,我们传入一个字典,将列与一个可调用对象配对,生成从最小值到最大值的行:
#pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
new_values = {"frames": lambda df: np.arange(df.min(), df.max() + 1)}
df.complete([new_values])
frames x
0 7729.0 0.0
1 7730.0 0.0
2 7731.0 0.0
3 7732.0 NaN
4 7733.0 NaN
5 7734.0 NaN
6 7735.0 1.0
7 7736.0 1.0
8 7737.0 1.0
9 7738.0 1.0
10 7739.0 NaN
11 7740.0 NaN
12 7741.0 2.0
13 7742.0 2.0