Question

在某些排队和库存问题中，系统的状态可以由1d整数numpy数组（具有恒定长度）表示。现在我想将状态映射到唯一的单个数字/字符串以快速识别它们。

目前我正在使用以下方法：

import numpy as np

#Example: state space consists of all integer arrays with 3 elements ranging from 0 to 4.
statespace      = np.indices([5]*3).reshape(3,-1).T 

#Obtain unique integers by converting to a number system with the smallest possible base. 
minvalues       = np.amin(statespace,axis=0)
maxvalues       = np.amax(statespace,axis=0)
base            = 1+np.max(maxvalues-minvalues)        
statecode       = np.power(base, np.arange(statespace.shape[1]))

def getStateCode(state):
    #Convert states to a unique integer by taking the dot product with statecode.
    return np.dot(state-minvalues,statecode)

#Obtain codes and sort them.    
codes = np.sort(getStateCode(statespace))  

def getStateIndex(state):
    #Searches for the state in the sorted vector with codes.
    statecodes  = getStateCode(state)
    return np.searchsorted(codes,statecodes).astype(int)

现在

state = np.array([0,0,0])
print(getStateIndex(state))

返回状态索引0。

对于中小型问题，这种方法效果很好，并且是矢量化的。但是，对于较大的问题，它会受到整数溢出的影响，尤其是base很大时。

特别是当状态空间中的某些元素具有比其他元素更大的范围时（例如，第一个元素的范围可以从0到100，而所有其他元素在0到3之间，从而产生基本101系统）。即使对于可以轻松存储在内存中的状态空间，也可能导致statecode的整数溢出。使statecode成为64位整数只会延迟问题。

是否有人有一种替代（矢量化？）方式将1d整数数组转换为唯一标识符而没有这个问题？由于这些比较是针对数百万个州反复进行的，因此该方法必须快速进行。我已经阅读过关于哈希函数的内容，但是我对使用它们有点犹豫，因为不能保证唯一性。

以矢量化方式将整数numpy数组映射到唯一标识符的替代方法

0 个答案: