Question

我有一个如下所示的数据框：

x        frames
0         7729.00  
0         7730.00     
0         7731.00
1         7735.00
1         7736.00
1         7737.00
1         7738.00
2         7741.00
2         7742.00

如您所见，frames 的值是连续的，但是当 x 发生变化时，frames 会发生跳跃。我想继续 frames 以便它总是增加 1，在这种情况下，使 x nan。像这样：

x        frames
0         7729.00  
0         7730.00     
0         7731.00
Nan       7732.00
Nan       7733.00
Nan       7734.00
1         7735.00
1         7736.00
1         7737.00
1         7738.00
Nan       7739.00
Nan       7740.00
2         7741.00
2         7742.00

编辑

这是我使用第一个解决方案时遇到的错误。

    df = df.set_index('frames').reindex(range(s.min(), s.max() + 1)).reset_index()
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 227, in wrapper
    return func(*args, **kwargs)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3856, in reindex
    return super().reindex(**kwargs)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4544, in reindex
    axes, level, limit, tolerance, method, fill_value, copy
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3744, in _reindex_axes
    index, method, copy, level, fill_value, limit, tolerance
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3766, in _reindex_index
    allow_dups=False,
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4613, in _reindex_with_indexers
    copy=copy,
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1251, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3099, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

如果我在数据框中还有另一列，如下所示：

x     y       frames
0     yes    7729.00  
0     yes    7730.00     
0     yes    7731.00
1     no     7735.00
1     no     7736.00
1     no     7737.00
1     no     7738.00
2     yes    7741.00
2     yes    7742.00

然后解决方案将所有其他列（x 和 y）转换为 NaN。

Answer 1

通过最小值和最大值使用 DataFrame.reindex 和 range：

s = df['frames'].astype(int)
df = df.set_index('frames').reindex(range(s.min(), s.max() + 1)).reset_index()
print (df)
    frames    x
0     7729  0.0
1     7730  0.0
2     7731  0.0
3     7732  NaN
4     7733  NaN
5     7734  NaN
6     7735  1.0
7     7736  1.0
8     7737  1.0
9     7738  1.0
10    7739  NaN
11    7740  NaN
12    7741  2.0
13    7742  2.0

或者在带有辅助 DataFrame 的 DataFrame.merge 中使用右连接：

s = df['frames'].astype(int)
df = df.merge(pd.DataFrame({'frames': range(s.min(), s.max() + 1)}), how='right')
print (df)
      x  frames
0   0.0  7729.0
1   0.0  7730.0
2   0.0  7731.0
3   NaN  7732.0
4   NaN  7733.0
5   NaN  7734.0
6   1.0  7735.0
7   1.0  7736.0
8   1.0  7737.0
9   1.0  7738.0
10  NaN  7739.0
11  NaN  7740.0
12  2.0  7741.0
13  2.0  7742.0

Answer 2

您可以使用 complete 中的 pyjanitor 函数来公开显式缺失的值。在这种情况下，我们传入一个字典，将列与一个可调用对象配对，生成从最小值到最大值的行：

#pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import janitor

new_values = {"frames": lambda df: np.arange(df.min(), df.max() + 1)}

df.complete([new_values])
 
    frames    x
0   7729.0  0.0
1   7730.0  0.0
2   7731.0  0.0
3   7732.0  NaN
4   7733.0  NaN
5   7734.0  NaN
6   7735.0  1.0
7   7736.0  1.0
8   7737.0  1.0
9   7738.0  1.0
10  7739.0  NaN
11  7740.0  NaN
12  7741.0  2.0
13  7742.0  2.0

如何根据条件在熊猫中创建行

编辑

2 个答案: