基于一个系列更改数据帧熊猫

时间:2016-08-20 00:03:42

标签: python pandas

我有数据并使用数据帧pandas进行转换:

import pandas as pd
d = [
  (1,70399,0.988375133622),
  (1,33919,0.981573492596),
  (1,62461,0.981426807114),
  (579,1,0.983018778374),
  (745,1,0.995580488899),
  (834,1,0.980942505189)
]
df_new = pd.DataFrame(e, columns=['source_target']).sort_values(['source_target'], ascending=[True])

我需要用于将列sourcetarget映射到另一个

的构建系列
e = []
for x in d:
  e.append(x[0])
  e.append(x[1])

e = list(set(e))
df_new = pd.DataFrame(e, columns=['source_target'])

df_new.source_target = (df_new.source_target.diff() != 0).cumsum() - 1
new_ser = pd.Series(df_new.source_target.values, index=new_source_old).drop_duplicates()

所以我得到了系列:

source_target
1        0
579      1
745      2
834      3
33919    4
62461    5
70399    6
dtype: int64

我尝试使用以下内容基于df_beda系列更改数据框new_ser

df_beda.target = df_beda.target.mask(df_beda.target.isin(new_ser), df_beda.target.map(new_ser)).astype(int)
df_beda.source = df_beda.source.mask(df_beda.source.isin(new_ser), df_beda.source.map(new_ser)).astype(int)

但结果是:

   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3     579       0  0.983019
4     745       0  0.995580
5     834       0  0.980943

这是错误的,理想的结果是:

   source  target    weight
0       0       6  0.988375
1       0       4  0.981573
2       0       5  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

也许任何人都可以帮我演示我的错误

由于

1 个答案:

答案 0 :(得分:2)

如果订单无关紧要,您可以执行以下操作。除非绝对必要,否则请避免for循环。

uniq_vals = np.unique(df_beda[['source','target']])
map_dict = dict(zip(uniq_vals, xrange(len(uniq_vals))))
df_beda[['source','target']] = df_beda[['source','target']].replace(map_dict)

print df_beda

   source  target    weight
0       0       6  0.988375
1       0       4  0.981573
2       0       5  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

如果要回滚,可以从原始映射创建反向映射,因为它保证是1对1映射。

inverse_map = {v:k for k,v in map_dict.iteritems()}
df_beda[['source','target']] = df_beda[['source','target']].replace(inverse_map)
print df_beda

   source  target    weight
0       1   70399  0.988375
1       1   33919  0.981573
2       1   62461  0.981427
3     579       1  0.983019
4     745       1  0.995580
5     834       1  0.980943