我需要计算句子中的单词数。我用
word_matrix[i][j] = sentences[i].count([*words_dict][j])
但是当一个单词包含在另一个单词中时(例如,“交互”中包含“ in”),它也会计算在内。如何避免呢?
答案 0 :(得分:1)
您可以为此使用collections.Counter
:
from collections import Counter
s = 'This is a sentence'
Counter(s.lower().split())
# Counter({'this': 1, 'is': 1, 'a': 1, 'sentence': 1})
答案 1 :(得分:1)
您可以这样做:
sentence = 'this is a test sentence'
word_count = len(sentence.split(' '))
在这种情况下,word_count为5。
答案 2 :(得分:0)
根据情况,最有效的解决方案是使用collection.Counter
,但您会错过所有带有符号的单词:
即in
与interactive
(根据需要)不同,但也与in:
不同。
考虑此问题的替代解决方案可能是计算RegEx的匹配模式:
import re
my_count = re.findall(r"(?:\s|^)({0})(?:[\s$\.,;:])".format([*words_dict][j]), sentences[i])
print(len(my_count))
RegEx在做什么?
对于给定的单词,您匹配:
相同的单词,其前面带有空格或行(\s|^)
然后在方括号([\s$\.,;:]
中加上空格,行尾,点,逗号和任何符号
答案 3 :(得分:0)
使用split标记语句中的单词,然后使用逻辑(如果dict中存在单词),然后将该值加1,否则将count设为1的单词添加>
paragraph='Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been'
words=paragraph.split()
word_count={}
counter=0
for i in words:
if i in word_count:
word_count[i]+=1
else:
word_count[i]=1
print(word_count)