以列特定的方式填写数字的数字

时间:2017-04-06 09:45:59

标签: python-3.x pandas numpy dataframe apply

给定DataFramelist个索引,是否有一个有效的pandas函数,它为所有值的nan值垂直位于列表的每个条目之前?

例如,假设我们有列表[4,8]和以下DataFrame

index     0      1
5         1      2
2         9      3 
4         3.2    3
8         9      8.7

所需的输出只是:

index     0        1
5         nan      nan
2         nan      nan 
4         3.2      nan
8         9        8.7

对于这样快速执行此功能的任何建议?

2 个答案:

答案 0 :(得分:2)

这是基于np.searchsorted -

的一种NumPy方法
s = [4,8]

a = df.values
idx = df.index.values
sidx = np.argsort(idx)
matching_row_indx = sidx[np.searchsorted(idx, s, sorter = sidx)]
mask = np.arange(a.shape[0])[:,None] < matching_row_indx
a[mask] = np.nan

示例运行 -

In [107]: df
Out[107]: 
         0    1
index          
5      1.0  2.0
2      9.0  3.0
4      3.2  3.0
8      9.0  8.7

In [108]: s = [4,8]

In [109]: a = df.values
     ...: idx = df.index.values
     ...: sidx = np.argsort(idx)
     ...: matching_row_indx = sidx[np.searchsorted(idx, s, sorter = sidx)]
     ...: mask = np.arange(a.shape[0])[:,None] < matching_row_indx
     ...: a[mask] = np.nan
     ...: 

In [110]: df
Out[110]: 
         0    1
index          
5      NaN  NaN
2      NaN  NaN
4      3.2  NaN
8      9.0  8.7

答案 1 :(得分:1)

重新创建你的例子有点棘手但是应该这样做:

import pandas as pd
import numpy as np

df = pd.DataFrame({'index': [5, 2, 4, 8], 0: [1, 9, 3.2, 9], 1: [2, 3, 3, 8.7]})
df.set_index('index', inplace=True)
for i, item in enumerate([4,8]):
    for index, row in df.iterrows():
        if index != item:
            row[i] = np.nan
        else:
            break