TypeError使用正则表达式在Python中进行文本分析

时间:2014-12-07 03:03:11

标签: python regex

我试图编写一些代码来扫描与正则表达式相匹配的每个字符串" PP +"并告诉我它出现了多少次。这是我的代码:

with open ('testfile.txt') as f:
data = f.read()
data = data.split()

import re


the_sum = 0

prolist = []

for word in data:
    pronoun = re.compile(r'PP+')
    result = pronoun.match(data)
    if word == result:
        the_sum += 1

print the_sum

我收到此错误消息:

Traceback (most recent call last):
  File "C:/Python27/RE_counter.py", line 14, in 
    result = pronoun.match(data)
TypeError: expected string or buffer

有人能告诉我我做错了吗?

2 个答案:

答案 0 :(得分:1)

您在每次迭代中都传递了整个列表( TypeError ),并且还没有正确检查匹配结果,因为它赢了&#39 ; t返回单词:

for word in data:
    pronoun = re.compile(r'PP+')
    result = pronoun.match(word)  # ← you had pronoun.match(data)
    if result is not None:        # ← you had if word == result
        the_sum += 1

答案 1 :(得分:0)

你可以直接得到你的东西。

with open ('testfile.txt') as f:
    data = f.read()
    print len(re.findall(r"\bPP\+\b",data))