用数据框的第一个非零值替换fillna值

时间:2019-08-22 02:38:39

标签: python pandas

我有一个如下数据框:

,VM,Storage Capacity MB,Memory Capacity MB,Powerstate,CPUs
0,abc1234,102400.0,4096,poweredOn,1

1,xyz1234,81920.0,4096,poweredOn,1

2,abc1234,,4096,poweredOff,1

3,xyz1234,,4096,poweredOff,1

具有NaN值的行需要替换为非零的第一个匹配项。输出需要如下:

,VM,Storage Capacity MB,Memory Capacity MB,Powerstate,CPUs
0,abc1234,102400.0,4096,poweredOn,1

1,xyz1234,81920.0,4096,poweredOn,1

2,abc1234,102400.0,4096,poweredOff,1

3,xyz1234,81920.0,4096,poweredOff,1

使用fillna.method('lasfil')并不能真正取代第一个匹配项。

    import pandas 
    file2 = pd.read_csv(r'c:\temp\pd_powerstate_new_south.csv')
    file2 = pd.read_csv(r'c:\temp\pd_powerstate_new_south.csv')
    file1.set_index('VM')
    file2.set_index('VM')
    merged_data = pd.merge(left = file1, right = file2, how = 'outer')
    merged_data.fillna("some custom 
    method").to_csv(r'c:\temp\mergeddata.csv')

实际结果必须如下:

 ,VM,Storage Capacity MB,Memory Capacity MB,Powerstate,CPUs 0,abc1234,102400.0,4096,poweredOn,1
 1,xyz1234,81920.0,4096,poweredOn,1
 2,abc1234,102400.0,4096,poweredOff,1
 3,xyz1234,81920.0,4096,poweredOff,1

2 个答案:

答案 0 :(得分:0)

您显然想通过VM执行此操作,因此错过了groupby。也是向前填充(ffill)(类似于您在Excel中向下拖动公式的方式)

df.groupby('VM').apply(lambda x: x.fillna(method='ffill'))

结果:

        VM  Storage Capacity MB  Memory Capacity MB  Powerstate  CPUs
0  abc1234             102400.0                4096   poweredOn     1
1  xyz1234              81920.0                4096   poweredOn     1
2  abc1234             102400.0                4096  poweredOff     1
3  xyz1234              81920.0                4096  poweredOff     1

答案 1 :(得分:0)

不确定您的数据框有多少个非NA值。如果只是少数几个,可以尝试结合使用df.ffill()。bfill()或df.bfill()。ffill(),并仅使用找到的非na值并将其向前/向后扩展。

如果有很多非na值,我将通过遍历列来建议一种解决方法:

for col in merged_data.columns:
    first_non_na_value = merged_data[col].dropna().iloc[0]
    merged_data[col] = merged_data[col].fillna(first_non_na_value)
相关问题