re.findall()无法从另一个文件中的文件中找到行

时间:2017-10-18 01:56:56

标签: python regex find

我有两个文本文件:一个包含文章中的文本,另一个包含phrasal verbs列表。我试图在文章中找到每个短语动词的每个实例。我知道这篇文章包含短语动词“登录”,短语动词列表也是如此。当我循环使用短语动词并使用re.findall()搜索每个短语动词时,它找不到任何动词。当我在短语动词列表的第1199行手动启动循环时,恰好是“登录”这个词,它就会找到它。当我在早些时候开始它只有一行时,在第1198行,它找不到它。这是我的代码:

import re
PV_HI = []
file = open('article.txt')
for line in open('phrasalVerbs.txt'):
    pv = line.strip()
    pvFound = re.findall(pv, file.read(), flags=re.I)
    PV_HI.extend(pvFound)
print(PV_HI)

以下是短语动词列表文本文件的示例:

Lock onto
Lock out
Lock up
Lock away
Log in
Log into
Log off
Log on
Log out
Look after
Look back
Look down on
Look for
Look forward to
Look in
Look in on
Look into

文章文件的样本:

<p> If you have a business account, a higher Pay Anyone limit up to $500,000 and also have a Security Device to authorise third party payments and/or can add Operators, you are an ANZ Internet Banking for Business customer.
<p> How do I manage my accounts once I am registered for ANZ Internet Banking?
<p> If you have registered for ANZ Internet Banking, use your CRN and password to log on to ANZ Internet Banking.
<p> If you need help while logged on to ANZ Internet Banking, click the " Help " icon in the top right hand corner of all pages. 

最终,我要做的是在一组1600个文件中计算所有短语动词。如果有更好的方法,我肯定愿意接受建议。

谢谢!

马特

1 个答案:

答案 0 :(得分:1)

我保存了短语动词和文章文件的样本(追加&#39;登录&#39;最后找到的字符),然后用你的python代码做一些测试。一开始,我也找不到任何结果。但是当我更改代码如下:

import re
PV_HI = []
with open('article.txt', 'r') as f:
    article_content = f.read()
    for line in open('phrasalVerbs.txt'):
        pv = line.strip()
        pvFound = re.findall(pv, article_content, flags=re.I)
        PV_HI.extend(pvFound)
    print(PV_HI)

它有效且成功找到&#39;登录&#39;。希望能帮助到你。