熊猫分配效率

时间:2017-03-29 04:52:14

标签: python pandas

我有一个Pandas数据框,我将添加一个新列(SUGGESTED)。 添加新列后,我将使用以下模式使用基于QUERY列值的新值更新它。例如:

QUERY = 'query'
SUGGESTED = 'suggested'
df[SUGGESTED] = numpy.nan
s_query = 'de'
new_value = 'delaware'
df.loc[(df[QUERY] == s_query), [SUGGESTED]] = new_value

示例:

query suggested
al      alabama
ca      california
de      NaN

后:

query suggested
    al      alabama
    ca      california
    de      delaware

到目前为止似乎工作,不确定在Pandas中是否有更有效的方法。

1 个答案:

答案 0 :(得分:1)

我认为您可以先在df[SUGGESTED] = numpy.nanloc解决方案中省略np.where,因为它会添加新列:

QUERY = 'query'
SUGGESTED = 'suggested'
s_query = 'de'
new_value = 'delaware'

#if need update existing column
df[SUGGESTED] = df[SUGGESTED].mask(df[QUERY] == s_query, new_value)
print (df)
  query   suggested
0    al     alabama
1    ca  california
2    de    delaware

使用loc的解决方案可以简化删除()(如果只有一个条件)并删除[](如果只有一列):

#for updating existing column 
df.loc[df[QUERY] == s_query, SUGGESTED] = new_value
print (df)
  query   suggested
0    al     alabama
1    ca  california
2    de    delaware

#same for creating new column
df.loc[df[QUERY] == s_query, SUGGESTED] = new_value
print (df)
  query suggested
0    al       NaN
1    ca       NaN
2    de  delaware

如果需要替换NaN,则不匹配:

#same for creating and updating existing column
df[SUGGESTED] = np.where(df[QUERY] == s_query, new_value, np.nan)
print (df)
  query suggested
0    al       nan
1    ca       nan
2    de  delaware