这是我第一次在这里问一些事情,我对此非常陌生,所以我会尽我所能。我有一个包含短语的列表,我想要消除所有类似的短语,例如:
array = ["A very long string saying some things",
"Another long string saying some things",
"extremely large string saying some things",
"something different",
"this is a test"]
我想要这个结果:
array2 = ["A very long string saying some things",
"something different",
"this is a test"]`
我有这个:
for i in range(len(array)):
swich=True
for j in range(len(array2)):
if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == True):
swich=False
pass
if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == False):
array2.pop(j)
但它给了我列表IndexError
...
fuzzy.ratio
比较两个字符串并给出0到100之间的值,越大,字符串越相似。
我尝试做的是逐个元素地比较列表,第一次找到两个相似的字符串时,只需打开开关并从该点开始传递每个类似的发现,弹出{{1 }}。我完全接受任何建议。
答案 0 :(得分:0)
您获得的错误是由修改列表引起的,您在该列表上进行迭代。 (永远不要添加/删除/替换当前迭代的迭代元素!)range(len(array2))
知道长度为N,但在array2.pop(j)
之后,长度不再是N,而是N-1。当之后尝试访问第N个元素时,您会得到IndexError
因为列表现在更短了。
快速猜测另一种方法:
original = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"]
filtered = list()
for original_string in original:
include = True
for filtered_string in filtered:
if fuzz.ratio(original_string, filtered_string) >= 80:
include = False
break
if include:
filtered.append(original_string)
请注意for string in array
循环,这是更多" pythonic"并且不需要整数变量也不需要范围。
答案 1 :(得分:0)
如何使用不同的库来压缩代码并减少循环次数?
.do(dispatchAction(...))