该操作类似于MYSQL操作:
UPDATE a.tract_201704 SET val_2000=0.91516427*val_2001 WHERE val_2001 IS NOT NULL AND val_2000 IS NULL.
我有很多列的df,其中有一个名为val_2000的列,如果这个包含空值,那么我想用0.91516427 * val_2001(标量乘法与下一个单元格)替换此值。
到目前为止代码:(val_2000有100或无)
df = pd.read_csv("singleDataFile_header.csv")
df_val2001_null = (df[df['val_2000'] != '100.000000000000']['val_2001'])
df_val2000_null = (df[df['val_2000'] != '100.000000000000']['val_2000'])
df_val2000_null = 0.91516427*df_val2001_null
但是如果df [val_2000]中没有值,那么如何将df_val2000_null中的值恢复为原始df?
答案 0 :(得分:2)
fillna
正是您要找的:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.fillna.html
df.loc[:, 'val_2000'] = df.val_2000.fillna(0.91516427 * df.val_2001)
答案 1 :(得分:1)
您可以使用combine_first
:
df = pd.DataFrame({'val_2000':[np.nan,2,3],
'val_2001':[4,5,6]})
print (df)
val_2000 val_2001
0 NaN 4
1 2.0 5
2 3.0 6
df['val_2000'] = df['val_2000'].combine_first(0.91516427 * df['val_2001'])
print (df)
val_2000 val_2001
0 3.660657 4
1 2.000000 5
2 3.000000 6
编辑:
可能的问题是nan
是字符串,而不是NaN
,或者数据是一些无效的字符串。
df = pd.DataFrame({'val_2000':['nan',100,'gggg'],
'val_2001':[1,1,1]})
print (df)
val_2000 val_2001
0 nan 1
1 100 1
2 gggg 1
df['val_2000'] = pd.to_numeric(df['val_2000'], errors='coerce')
print (df)
val_2000 val_2001
0 NaN 1
1 100.0 1
2 NaN 1
df['val_2000'] = df['val_2000'].combine_first(0.91516427 * df['val_2001'])
print (df)
val_2000 val_2001
0 0.915164 1
1 100.000000 1
2 0.915164 1
仅限nan
:
df = pd.DataFrame({'val_2000':['nan',100,100],
'val_2001':[1,1,1]})
print (df)
val_2000 val_2001
0 nan 1
1 100 1
2 100 1
df['val_2000'] = df['val_2000'].astype(float)
print (df)
val_2000 val_2001
0 NaN 1
1 100.0 1
2 100.0 1