在此代码段train_dataset
中,test_dataset
和valid_dataset
属于numpy.ndarray
类型。
def check_overlaps(images1, images2):
images1.flags.writeable=False
images2.flags.writeable=False
print(type(images1))
print(type(images2))
start = time.clock()
hash1 = set([hash(image1.data) for image1 in images1])
hash2 = set([hash(image2.data) for image2 in images2])
all_overlaps = set.intersection(hash1, hash2)
return all_overlaps, time.clock()-start
r, execTime = check_overlaps(train_dataset, test_dataset)
print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(train_dataset, valid_dataset)
print("# overlaps between training and validation sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(valid_dataset, test_dataset)
print("# overlaps between validation and test sets:", len(r), "execution time:", execTime)
但这会产生以下错误: (格式化为代码以使其可读!)
ValueError Traceback (most recent call last)
<ipython-input-14-337e73a1cb14> in <module>()
12 return all_overlaps, time.clock()-start
13
---> 14 r, execTime = check_overlaps(train_dataset, test_dataset)
15 print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
16 r, execTime = check_overlaps(train_dataset, valid_dataset)
<ipython-input-14-337e73a1cb14> in check_overlaps(images1, images2)
7 print(type(images2))
8 start = time.clock()
----> 9 hash1 = set([hash(image1.data) for image1 in images1])
10 hash2 = set([hash(image2.data) for image2 in images2])
11 all_overlaps = set.intersection(hash1, hash2)
<ipython-input-14-337e73a1cb14> in <listcomp>(.0)
7 print(type(images2))
8 start = time.clock()
----> 9 hash1 = set([hash(image1.data) for image1 in images1])
10 hash2 = set([hash(image2.data) for image2 in images2])
11 all_overlaps = set.intersection(hash1, hash2)
ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'
现在问题是我甚至不知道错误意味着什么,更不用说考虑纠正它了。有什么帮助吗?
答案 0 :(得分:19)
问题是您的哈希数组方法仅适用于python2
。因此,只要您尝试计算hash(image1.data)
,代码就会失败。错误消息告诉您仅支持memoryview
格式的无符号字节('B'
),字节('b'
)的单字节('c'
),我还没有找到一种在不复制的情况下从np.ndarray
获取此类视图的方法。我提出的唯一方法包括复制数组,这可能在您的应用程序中不可行,具体取决于您的数据量。话虽这么说,你可以尝试将你的功能改为:
def check_overlaps(images1, images2):
start = time.clock()
hash1 = set([hash(image1.tobytes()) for image1 in images1])
hash2 = set([hash(image2.tobytes()) for image2 in images2])
all_overlaps = set.intersection(hash1, hash2)
return all_overlaps, time.clock()-start