Question

我的意图是拥有大量文本并首先将其翻译成所有小写。（它的作用）然后，删除文本中的标点符号。（它没有）最后，打印出所用单词的频率。（它打印出测试。并测试两种不同的东西。）

from collections import Counter



text = """
Test. test test. Test Test test. 
""".lower().strip(".")



words = text.split()
counts = Counter(words)
print(counts)

任何帮助都将不胜感激。

Answer 1

您需要.replace('.', '')代替strip

Answer 2

您可以拆分列表中的文本然后去除标点符号，或使用roganjosh的建议，即使用.replace（'。'，''）：

方式1：

text = "Test. test test. Test Test test."
word = text.split()
the_list = [i.strip('.') for i in word]
counts = Counter(the_list)

请注意，对于.strip（），只会删除字符串末尾的标点符号，而不是中间符号。

方式2：

new_text = text.replace('.', '')
counts = Counter(new_text)

Answer 3

如果您想要的只是提取单词（用于计数或任何其他原因），请使用正则表达式<canvas baseChart [datasets]="barChartData" [labels]="barChartLabels" [options]="barChartOptions" [legend]="barChartLegend" [chartType]="barChartType"> </canvas>（或re.findall如果文本很大并且您不想收集所有记忆中的匹配）：

re.finditer

请注意，使用非ASCII文本可能会更棘手（并且不会考虑，例如带有破折号的单词）。

我的strip（）函数没有删除

3 个答案: