import pandas as pd
import numpy as np
column = np.array([5505, 5505, 5505, 34565, 34565, 65539, 65539])
column = pd.Series(column)
myDict = column.groupby(by = column ).groups
我正在使用dictionary
从pandas df
创建df.group(by=..)
,其格式为:
>>> myDict
{5505: Int64Index([0, 1, 2], dtype='int64'), 65539: Int64Index([5, 6], dtype='int64'), 34565: Int64Index([3, 4], dtype='int64')}
我有numpy array
,例如
myArray = np.array([34565, 34565, 5505,65539])
我希望用字典的值替换每个数组的元素。
我已经尝试了几个我找到的解决方案(例如here和here)但是这些示例包含带有单个字典values
的字典,而且我总是得到setting an array element with a sequence
的错误。我该如何克服这个问题?
我的预期输出是
np.array([3, 4, 3, 4, 0, 1, 2, 5, 6])
答案 0 :(得分:1)
基于np.searchsorted
-
# Extract dict info
k = list(myDict.keys())
v = list(myDict.values())
# Use argsort of k to find search sorted indices from myArray in keys
# Index into the values of dict based on those indices for output
sidx = np.argsort(k)
idx = sidx[np.searchsorted(k,myArray,sorter=sidx)]
out_arr = np.concatenate([v[i] for i in idx])
示例输入,输出 -
In [369]: myDict
Out[369]:
{5505: Int64Index([0, 1, 2], dtype='int64'),
34565: Int64Index([3, 4], dtype='int64'),
65539: Int64Index([5, 6], dtype='int64')}
In [370]: myArray
Out[370]: array([34565, 34565, 5505, 65539])
In [371]: out_arr
Out[371]: array([3, 4, 3, 4, 0, 1, 2, 5, 6])