我有一个字符串“Hello I am going to I with hello am
”。我想找出一个单词在字符串中出现的次数。示例hello发生2次。我尝试过只打印字符的方法 -
def countWord(input_string):
d = {}
for word in input_string:
try:
d[word] += 1
except:
d[word] = 1
for k in d.keys():
print "%s: %d" % (k, d[k])
print countWord("Hello I am going to I with Hello am")
我想学习如何找到字数。
答案 0 :(得分:35)
如果您想查找单个单词的计数,请使用count
:
input_string.count("Hello")
使用collections.Counter
和split()
来计算所有字词:
from collections import Counter
words = input_string.split()
wordCount = Counter(words)
答案 1 :(得分:6)
Counter
from collections是你的朋友:
>>> from collections import Counter
>>> counts = Counter(sentence.lower().split())
答案 2 :(得分:3)
from collections import *
import re
Counter(re.findall(r"[\w']+", text.lower()))
使用re.findall
比split
更通用,因为否则您无法考虑收缩,例如“不要”和“我会”等等。
演示(使用您的示例):
>>> countWords("Hello I am going to I with hello am")
Counter({'i': 2, 'am': 2, 'hello': 2, 'to': 1, 'going': 1, 'with': 1})
如果您希望进行许多这样的查询,这只会执行一次O(N)工作,而不是O(N *#查询)工作。
答案 3 :(得分:3)
单词出现次数的向量称为bag-of-words。
Scikit-learn为计算它提供了一个很好的模块,sklearn.feature_extraction.text.CountVectorizer
。例如:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(analyzer = "word", \
tokenizer = None, \
preprocessor = None, \
stop_words = None, \
min_df = 0, \
max_features = 50)
text = ["Hello I am going to I with hello am"]
# Count
train_data_features = vectorizer.fit_transform(text)
vocab = vectorizer.get_feature_names()
# Sum up the counts of each vocabulary word
dist = np.sum(train_data_features.toarray(), axis=0)
# For each, print the vocabulary word and the number of times it
# appears in the training set
for tag, count in zip(vocab, dist):
print count, tag
输出:
2 am
1 going
2 hello
1 to
1 with
部分代码来自此Kaggle tutorial on bag-of-words。
答案 4 :(得分:2)
将Hello
和hello
视为相同的字词,不论其情况如何:
>>> from collections import Counter
>>> strs="Hello I am going to I with hello am"
>>> Counter(map(str.lower,strs.split()))
Counter({'i': 2, 'am': 2, 'hello': 2, 'to': 1, 'going': 1, 'with': 1})
答案 5 :(得分:1)
这是一种替代的,不区分大小写的方法
sum(1 for w in s.lower().split() if w == 'Hello'.lower())
2
通过将字符串和目标转换为小写来匹配。
ps:关注下面@DSM指出的"am ham".count("am") == 2
str.count()
问题:)
答案 6 :(得分:1)
您可以将字符串划分为元素并计算其数字
count = len(my_string.split())
答案 7 :(得分:0)
您可以使用Python正则表达式库re
查找子字符串中的所有匹配项并返回该数组。
import re
input_string = "Hello I am going to I with Hello am"
print(len(re.findall('hello', input_string.lower())))
<强>打印强>
2
答案 8 :(得分:0)
def countSub(pat,string):
result = 0
for i in range(len(string)-len(pat)+1):
for j in range(len(pat)):
if string[i+j] != pat[j]:
break
else:
result+=1
return result