如何从字符串列表中找到重复项列表? 给出了clean_up函数
def clean_up(s):
""" (str) -> str
Return a new string based on s in which all letters have been
converted to lowercase and punctuation characters have been stripped
from both ends. Inner punctuation is left untouched.
>>> clean_up('Happy Birthday!!!')
'happy birthday'
>>> clean_up("-> It's on your left-hand side.")
" it's on your left-hand side"
"""
punctuation = """!"',;:.-?)([]<>*#\n\t\r"""
result = s.lower().strip(punctuation)
return result
这是我的重复功能。
def duplicate(text):
""" (list of str) -> list of str
>>> text = ['James Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
'James Gosling\n']
>>> duplicate(text)
['james']
"""
cleaned = ''
non_duplicate = []
unique = []
for word in text:
cleaned += clean_up(word).replace(",", " ") + " "
words = cleaned.split()
for word in words:
if word in unique:
我被困在这里.. 我不能使用字典或任何其他技术来保持文本中每个单词的频率计数。 请帮忙..
答案 0 :(得分:1)
你有问题:
cleaned += clean_up(word).replace(",", " ") + " "
此行将新的“单词”添加到目前为止所有单词的增长字符串中。因此,每次通过for
循环,您都会重新检查到目前为止看到的所有单词。
相反,你需要这样做:
for phrase in text:
for word in phrase.split(" "):
word = clean_up(word)
这意味着您只需处理一次单词。然后,您可能需要将其添加到其中一个列表中,具体取决于它是否已存在于其中任何一个列表中。我建议您拨打您的列表seen
和duplicates
,以便更清楚地了解正在发生的事情。