Question

在此代码段train_dataset中，test_dataset和valid_dataset属于numpy.ndarray类型。

def check_overlaps(images1, images2):
    images1.flags.writeable=False
    images2.flags.writeable=False
    print(type(images1))
    print(type(images2))
    start = time.clock()
    hash1 = set([hash(image1.data) for image1 in images1])
    hash2 = set([hash(image2.data) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start

r, execTime = check_overlaps(train_dataset, test_dataset)    
print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(train_dataset, valid_dataset)   
print("# overlaps between training and validation sets:", len(r), "execution time:", execTime) 
r, execTime = check_overlaps(valid_dataset, test_dataset) 
print("# overlaps between validation and test sets:", len(r), "execution time:", execTime)

但这会产生以下错误：（格式化为代码以使其可读！）

ValueError                                Traceback (most recent call last)
<ipython-input-14-337e73a1cb14> in <module>()
     12     return all_overlaps, time.clock()-start
     13 
---> 14 r, execTime = check_overlaps(train_dataset, test_dataset)
     15 print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
     16 r, execTime = check_overlaps(train_dataset, valid_dataset)

<ipython-input-14-337e73a1cb14> in check_overlaps(images1, images2)
      7     print(type(images2))
      8     start = time.clock()
----> 9     hash1 = set([hash(image1.data) for image1 in images1])
     10     hash2 = set([hash(image2.data) for image2 in images2])
     11     all_overlaps = set.intersection(hash1, hash2)

<ipython-input-14-337e73a1cb14> in <listcomp>(.0)
      7     print(type(images2))
      8     start = time.clock()
----> 9     hash1 = set([hash(image1.data) for image1 in images1])
     10     hash2 = set([hash(image2.data) for image2 in images2])
     11     all_overlaps = set.intersection(hash1, hash2)

ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'

现在问题是我甚至不知道错误意味着什么，更不用说考虑纠正它了。有什么帮助吗？

Answer 1

问题是您的哈希数组方法仅适用于python2。因此，只要您尝试计算hash(image1.data)，代码就会失败。错误消息告诉您仅支持memoryview格式的无符号字节（'B'），字节（'b'）的单字节（'c'），我还没有找到一种在不复制的情况下从np.ndarray获取此类视图的方法。我提出的唯一方法包括复制数组，这可能在您的应用程序中不可行，具体取决于您的数据量。话虽这么说，你可以尝试将你的功能改为：

def check_overlaps(images1, images2):
    start = time.clock()
    hash1 = set([hash(image1.tobytes()) for image1 in images1])
    hash2 = set([hash(image2.tobytes()) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start

如何在numpy中解决这个内存视图错误？

1 个答案: