我有一个如下数据框:
,VM,Storage Capacity MB,Memory Capacity MB,Powerstate,CPUs
0,abc1234,102400.0,4096,poweredOn,1
1,xyz1234,81920.0,4096,poweredOn,1
2,abc1234,,4096,poweredOff,1
3,xyz1234,,4096,poweredOff,1
具有NaN值的行需要替换为非零的第一个匹配项。输出需要如下:
,VM,Storage Capacity MB,Memory Capacity MB,Powerstate,CPUs
0,abc1234,102400.0,4096,poweredOn,1
1,xyz1234,81920.0,4096,poweredOn,1
2,abc1234,102400.0,4096,poweredOff,1
3,xyz1234,81920.0,4096,poweredOff,1
使用fillna.method('lasfil')
并不能真正取代第一个匹配项。
import pandas
file2 = pd.read_csv(r'c:\temp\pd_powerstate_new_south.csv')
file2 = pd.read_csv(r'c:\temp\pd_powerstate_new_south.csv')
file1.set_index('VM')
file2.set_index('VM')
merged_data = pd.merge(left = file1, right = file2, how = 'outer')
merged_data.fillna("some custom
method").to_csv(r'c:\temp\mergeddata.csv')
实际结果必须如下:
,VM,Storage Capacity MB,Memory Capacity MB,Powerstate,CPUs 0,abc1234,102400.0,4096,poweredOn,1
1,xyz1234,81920.0,4096,poweredOn,1
2,abc1234,102400.0,4096,poweredOff,1
3,xyz1234,81920.0,4096,poweredOff,1
答案 0 :(得分:0)
您显然想通过VM执行此操作,因此错过了groupby
。也是向前填充(ffill
)(类似于您在Excel中向下拖动公式的方式)
df.groupby('VM').apply(lambda x: x.fillna(method='ffill'))
结果:
VM Storage Capacity MB Memory Capacity MB Powerstate CPUs
0 abc1234 102400.0 4096 poweredOn 1
1 xyz1234 81920.0 4096 poweredOn 1
2 abc1234 102400.0 4096 poweredOff 1
3 xyz1234 81920.0 4096 poweredOff 1
答案 1 :(得分:0)
不确定您的数据框有多少个非NA值。如果只是少数几个,可以尝试结合使用df.ffill()。bfill()或df.bfill()。ffill(),并仅使用找到的非na值并将其向前/向后扩展。
如果有很多非na值,我将通过遍历列来建议一种解决方法:
for col in merged_data.columns:
first_non_na_value = merged_data[col].dropna().iloc[0]
merged_data[col] = merged_data[col].fillna(first_non_na_value)