我正在尝试在一组字符串中找到相似的单词。我正在使用SequenceMatcher
中的difflib
。
一旦找到类似的单词,为避免重复,我尝试使用.remove(word)
删除它,但由于出现错误ValueError: list.remove(x): x not in list
。
我可以知道为什么无法从列表中删除该元素吗?
tags = ['python', 'tips', 'tricks', 'resources', 'flask', 'cron', 'tools', 'scrabble', 'code challenges', 'github', 'fork', 'learning', 'game', 'itertools', 'random', 'sets', 'twitter', 'news', 'python', 'podcasts', 'data science', 'challenges', 'APIs', 'conda', '3.6', 'code challenges', 'code review', 'HN', 'github', 'learning', 'max', 'generators', 'scrabble', 'refactoring', 'iterators', 'itertools', 'tricks', 'generator', 'games']
similar_tags = []
for word1 in tag:
for word2 in tag:
if word1[0] == word2[0]:
if 0.87 < SequenceMatcher(None, word1, word2).ratio() < 1 :
similar_tags.append((word1,word2))
tag.remove(word1)
print(similar_tags) # add for debugging
但是我得到一个错误
Traceback (most recent call last):
File "tags.py", line 71, in <module>
similar_tags = dict(get_similarities(tags))
File "tags.py", line 52, in get_similarities
tag.remove(word1)
ValueError: list.remove(x): x not in list
答案 0 :(得分:1)
如果您有两个单词word21
和word22
在指定约束下与word1
匹配,则从列表word21
中删除时,没有列表中的word1
将从word22
中删除。
因此,您可以通过以下修改对其进行纠正:
for word1 in tag:
is_found = False #add this flag
for word2 in tag:
if word1[0] == word2[0]:
if 0.87 < SequenceMatcher(None, word1, word2).ratio() < 1 :
is_found = True #true here as you want to remove it after the termination of the current loop
similar_tags.append((word1,word2))
if is_found: #if founded this word under the specified constraint at least one time, the remove it from the list
tag.remove(word1)
答案 1 :(得分:0)
您修改要迭代的列表,这是一件不好的事
将单词推到新列表中,然后删除新列表中存在的项目表单标签列表,尝试类似
similar_tags = []
to_be_removed = []
for word1 in tag:
for word2 in tag:
if word1[0] == word2[0]:
if 0.87 < SequenceMatcher(None, word1, word2).ratio() < 1 :
similar_tags.append((word1,word2))
to_be_removed.append(word1)
for word in to_be_removed:
if word in tag:
tag.remove(word)
print(similar_tags) # add for debugging