我有一个Pandas数据框,我将添加一个新列(SUGGESTED)。 添加新列后,我将使用以下模式使用基于QUERY列值的新值更新它。例如:
QUERY = 'query'
SUGGESTED = 'suggested'
df[SUGGESTED] = numpy.nan
s_query = 'de'
new_value = 'delaware'
df.loc[(df[QUERY] == s_query), [SUGGESTED]] = new_value
示例:
query suggested
al alabama
ca california
de NaN
后:
query suggested
al alabama
ca california
de delaware
到目前为止似乎工作,不确定在Pandas中是否有更有效的方法。
答案 0 :(得分:1)
我认为您可以先在df[SUGGESTED] = numpy.nan
和loc
解决方案中省略np.where
,因为它会添加新列:
QUERY = 'query'
SUGGESTED = 'suggested'
s_query = 'de'
new_value = 'delaware'
#if need update existing column
df[SUGGESTED] = df[SUGGESTED].mask(df[QUERY] == s_query, new_value)
print (df)
query suggested
0 al alabama
1 ca california
2 de delaware
使用loc
的解决方案可以简化删除()
(如果只有一个条件)并删除[]
(如果只有一列):
#for updating existing column
df.loc[df[QUERY] == s_query, SUGGESTED] = new_value
print (df)
query suggested
0 al alabama
1 ca california
2 de delaware
#same for creating new column
df.loc[df[QUERY] == s_query, SUGGESTED] = new_value
print (df)
query suggested
0 al NaN
1 ca NaN
2 de delaware
如果需要替换NaN
,则不匹配:
#same for creating and updating existing column
df[SUGGESTED] = np.where(df[QUERY] == s_query, new_value, np.nan)
print (df)
query suggested
0 al nan
1 ca nan
2 de delaware