给定DataFrame
和list
个索引,是否有一个有效的pandas
函数,它为所有值的nan
值垂直位于列表的每个条目之前?
例如,假设我们有列表[4,8]
和以下DataFrame
:
index 0 1
5 1 2
2 9 3
4 3.2 3
8 9 8.7
所需的输出只是:
index 0 1
5 nan nan
2 nan nan
4 3.2 nan
8 9 8.7
对于这样快速执行此功能的任何建议?
答案 0 :(得分:2)
这是基于np.searchsorted
-
s = [4,8]
a = df.values
idx = df.index.values
sidx = np.argsort(idx)
matching_row_indx = sidx[np.searchsorted(idx, s, sorter = sidx)]
mask = np.arange(a.shape[0])[:,None] < matching_row_indx
a[mask] = np.nan
示例运行 -
In [107]: df
Out[107]:
0 1
index
5 1.0 2.0
2 9.0 3.0
4 3.2 3.0
8 9.0 8.7
In [108]: s = [4,8]
In [109]: a = df.values
...: idx = df.index.values
...: sidx = np.argsort(idx)
...: matching_row_indx = sidx[np.searchsorted(idx, s, sorter = sidx)]
...: mask = np.arange(a.shape[0])[:,None] < matching_row_indx
...: a[mask] = np.nan
...:
In [110]: df
Out[110]:
0 1
index
5 NaN NaN
2 NaN NaN
4 3.2 NaN
8 9.0 8.7
答案 1 :(得分:1)
重新创建你的例子有点棘手但是应该这样做:
import pandas as pd
import numpy as np
df = pd.DataFrame({'index': [5, 2, 4, 8], 0: [1, 9, 3.2, 9], 1: [2, 3, 3, 8.7]})
df.set_index('index', inplace=True)
for i, item in enumerate([4,8]):
for index, row in df.iterrows():
if index != item:
row[i] = np.nan
else:
break