Question

有很多关于python中子字符串出现的帖子，但我找不到任何关于文本中字符串出现的信息。

testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words"

#Suppose my search term is a, then I would expect the output of my program to be:
print testSTR.myfunc("a")
>>1

由于只有1个具体引用字符串＆＃34; a＆＃34;在整个输入中。 count()不会做，因为它也会计算子字符串，所以得到的输出是：

print testSTR.count()
>>3

可以这样做吗？

Answer 1

分割字符串后，您可以使用集合来执行此操作。

from collections import Counter
print Counter(testSTR.split())

输出看起来像

Counter({'you': 2, 'a': 1, 'and': 1, 'words': 1, 'text': 1, 'some': 1, 'the': 1, 'large': 1, 'to': 1, 'Suppose': 1, 'are': 1, 'have': 1, 'of': 1, 'specific': 1, 'trying': 1, 'find': 1, 'occurences': 1})

要获取特定子字符串a的使用次数，

from collections import Counter
res = Counter(testSTR.split())
print res['a']

如果计数需要不区分大小写，请在计数之前使用upper()或lower转换子字符串。

res= Counter(i.lower() for i in testSTR.split())

Answer 2

我认为最直接的方法是使用正则表达式：

#import <GoogleMobileAds/GADRewardBasedVideoAdDelegate.h>

import re testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words" print len(re.findall(r"\ba\b", testSTR)) # 1检查\ba\b之前和之后的“单词边界”，其中“单词边界”是标点符号，空格或整个字符串的开头或结尾。这比仅仅拆分空白更有用，除非那是你想要的......

Answer 3

如果你担心标点符号，你应该试试这个：

words = testSTR.split().map(lambda s: s.strip(".!?:;,\"'"))
print "a" in words

文本python中字符串的出现

3 个答案: