Question

import pandas as pd
import numpy as np    
column = np.array([5505, 5505, 5505, 34565, 34565, 65539, 65539])
column = pd.Series(column)
myDict = column.groupby(by = column ).groups

我正在使用dictionary从pandas df创建df.group(by=..)，其格式为：

>>> myDict
{5505: Int64Index([0, 1, 2], dtype='int64'), 65539: Int64Index([5, 6], dtype='int64'), 34565: Int64Index([3, 4], dtype='int64')}

我有numpy array，例如

myArray = np.array([34565, 34565, 5505,65539])

我希望用字典的值替换每个数组的元素。我已经尝试了几个我找到的解决方案（例如here和here）但是这些示例包含带有单个字典values的字典，而且我总是得到setting an array element with a sequence的错误。我该如何克服这个问题？

我的预期输出是

np.array([3, 4, 3, 4, 0, 1, 2, 5, 6])

Answer 1

基于np.searchsorted -

的一种方法

# Extract dict info
k = list(myDict.keys())
v = list(myDict.values())

# Use argsort of k to find search sorted indices from myArray in keys
# Index into the values of dict based on those indices for output
sidx = np.argsort(k)
idx = sidx[np.searchsorted(k,myArray,sorter=sidx)]
out_arr = np.concatenate([v[i] for i in idx])

示例输入，输出 -

In [369]: myDict
Out[369]: 
{5505: Int64Index([0, 1, 2], dtype='int64'),
 34565: Int64Index([3, 4], dtype='int64'),
 65539: Int64Index([5, 6], dtype='int64')}

In [370]: myArray
Out[370]: array([34565, 34565,  5505, 65539])

In [371]: out_arr
Out[371]: array([3, 4, 3, 4, 0, 1, 2, 5, 6])

用非标量字典值替换numpy元素

1 个答案: