从大字典中替换DataFrame中的值的更好方法

时间:2016-11-10 13:19:20

标签: python-3.x pandas dictionary replace

我编写了一些代码,使用字典替换DataFrame中的值和来自另一个帧的值,并且它正在工作,但我在一些大文件上使用它,字典可能会变得很长。几千双。然后,当我使用这段代码时,它的运行速度非常慢,并且在几个时刻也一直没有内存。

我有点相信我这样做的方法远非最优,并且必须有一些更快的方法来做到这一点。我创建了一个简单的例子来做我想要的,但对于大量数据来说这很慢。希望有人有更简单的方法来做到这一点。

import pandas as pd

#Frame with data where I want to replace the 'id' with the name from df2
df1 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42,    51, 23, 14, 111, 134]})

#Frame containing names linked to ids
df2 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'name' : ['id1',   'id2', 'id3', 'id4', 'id5', 'id6', 'id7', 'id8', 'id9', 'id10']})

#My current "slow" way of doing this.

#Starts by creating a dictionary from df2
#Need to create dictionaries from the domain and banners tables to link ids
df2_dict = dict(zip(df2['id'], df2['name']))

#and then uses the dict to replace the ids with name in df1
df1.replace({'id' : df2_dict}, inplace=True)

1 个答案:

答案 0 :(得分:1)

我认为您可以map使用Series转换to_dict - NaN如果df2中不存在值df1['id'] = df1.id.map(df2.set_index('id')['name'].to_dict()) print (df1) id values 0 id1 12 1 id2 32 2 id3 42 3 id4 51 4 id5 23 5 id3 14 6 id5 111 7 id9 134

df2

replace,如果不存在df1中的值,请使用df1['id'] = df1.id.replace(df2.set_index('id')['name']) print (df1) id values 0 id1 12 1 id2 32 2 id3 42 3 id4 51 4 id5 23 5 id3 14 6 id5 111 7 id9 134 中的原始值:

#Frame with data where I want to replace the 'id' with the name from df2
df1 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42,    51, 23, 14, 111, 134]})
print (df1)
#Frame containing names linked to ids
df2 = pd.DataFrame({'id' : [1, 2, 3, 4, 6, 7, 8, 9, 10], 'name' : ['id1',   'id2', 'id3', 'id4', 'id6', 'id7', 'id8', 'id9', 'id10']})
print (df2)

df1['new_map'] = df1.id.map(df2.set_index('id')['name'].to_dict())
df1['new_replace'] = df1.id.replace(df2.set_index('id')['name'])
print (df1)
   id  values new_map new_replace
0   1      12     id1         id1
1   2      32     id2         id2
2   3      42     id3         id3
3   4      51     id4         id4
4   5      23     NaN           5
5   3      14     id3         id3
6   5     111     NaN           5
7   9     134     id9         id9

样品:

    var userSchema = mongoose.Schema({
    local: {
        username: String,
        password: String,
        access:[{

        nameOfgroup1: String,
        available: Boolean
    },
    {   
        nameOfgroup2: String,
        available: Boolean

   }, 
   {    
        nameOfgroup3: String,
        available: Boolean

   }]}});