用于pd.DataFrame的循环中的字典

时间:2016-08-22 18:37:46

标签: python pandas for-loop dictionary

我的数据集中有很多列&我需要改变一些变量的值。我这样做

import pandas as pd
import numpy as np
df = pd.DataFrame({'one':['a' , 'b']*5, 'two':['c' , 'd']*5, 'three':['a' , 'd']*5})

选择

df1 = df[['one', 'two']]

字典

map = { 'a' : 'd', 'b' : 'c', 'c' : 'b', 'd' : 'a'}

和循环

df2=[]
for i in df1.values:
    np = [ map[x] for x in i]
    df2.append(np)

然后我改变了列

df['one'] = [row[0] for row in df2]
df['two'] = [row[1] for row in df2]

它有效,但它很长。如何缩短它?

3 个答案:

答案 0 :(得分:2)

您可以使用make迭代列:

--print-data-base

时序:

Series.map()

答案 1 :(得分:2)

仅使用'a','b'值传递col的整个地图效率不高。首先检查df col中的值。然后只为它们映射,如下所示:

dp

这也是可能的:

dp

同时注意重叠键(例如,如果你想平行改变,可以说a到b和b到c但不喜欢a-> b-> c)......

>>> cols = ['one', 'two'];
>>> map = { 'a' : 'd', 'b' : 'c', 'c' : 'b', 'd' : 'a'};

>>> for col in cols:
...  colSet = set(df[col].values);
...  colMap = {k:v for k,v in map.items() if k in colSet};
...  df.replace(to_replace={col:colMap},inplace=True);#not efficient like rly
...  
>>> df
  one three two
0   d     a   b
1   c     d   a
2   d     a   b
3   c     d   a
4   d     a   b
5   c     d   a
6   d     a   b
7   c     d   a
8   d     a   b
9   c     d   a
>>>
#OR
In [12]: %%timeit
...: for col in cols:
...:  colSet = set(df[col].values);
...:  colMap = {k:v for k,v in map.items() if k in colSet};
...:  df[col].map(colMap)
...:
...:
1 loop, best of 3: 1.93 s per loop 
#OR WHEN INPLACE
In [8]: %%timeit
   ...: for col in cols:
   ...:  colSet = set(df[col].values);
   ...:  colMap = {k:v for k,v in map.items() if k in colSet};
   ...:  df[col]=df[col].map(colMap)
   ...:
   ...:
1 loop, best of 3: 2.18 s per loop

答案 2 :(得分:1)

new RegExp(…)

enter image description here