在文本文件的每一行中的第一个逗号之前提取文本

时间:2017-09-13 06:23:46

标签: python regex file

我有一个看起来像这样的文件:

Breve, a writ; used more frequently in the plural brevia. 
Brevia magistralia, official writs framed by the clerks in 
chancery to meet new injuries, to which the old forms of action 
were inapplicable. Sea Trespass on the case. Brevia testata, 
short attested memoranda, originally introduced to obviate the 
uncertainty arisina; from parol feoffments, hence modern con- 
veyances have gradually arisen. 

我想提取每行中第一个逗号(,)之前出现的单词

预期产出:

Breve
Brevia magistralia
chancery to meet new injuries
were inapplicable. Sea Trespass on the case. Brevia testata
short attested memoranda
uncertainty arisina; from parol feoffments

我的代码:

with open('test.txt','r') as file:
    for line in file:
        print(line[0:line.find(',')])

输出:

Breve

感谢任何帮助

5 个答案:

答案 0 :(得分:1)

为什么需要正则表达式? str.split应该足够好了。

with open('test.txt','r') as file:
    for line in file:
        text = line.split(',', 1)[0] # add nsplits = 1 for efficiency 
        ... # do something with text

但是,如果你真的需要正则表达式,你可以使用类似的东西:

for line in file:
        m = re.match('[^,]+', line)
        if m:
            text = m.group(0)

[^,]+匹配起点的任何内容,不是逗号(credits)。

答案 1 :(得分:1)

re.findall()解决方案:

import re
with open('test.txt', 'r') as f:
    result = re.findall(r'^[^,]+(?=,)', f.read(), re.M)   # extracting the needed words
    print('\n'.join(result))

输出:

Breve
Brevia magistralia
chancery to meet new injuries
were inapplicable. Sea Trespass on the case. Brevia testata
short attested memoranda
uncertainty arisina; from parol feoffments

答案 2 :(得分:1)

你正好进行这项修改,

Breve
Brevia magistralia
chancery to meet new injuries
were inapplicable. Sea Trespass on the case. Brevia testata
short attested memoranda
uncertainty arisina; from parol feoffments

输出:

aa bb cc 
dd ee ff
gg hh ii
ll mm nn
oo pp qq 

答案 3 :(得分:1)

这是一个额外的答案,你可以使用re.search:

import re
with open('test.txt','r') as file:
    for line in file:
       # print(line)
        result = re.search(r'^[^,]+(?=,)', line )
        if result:
            text = result.group(0)
            print(text)

<强>输出:

Breve
Brevia magistralia
chancery to meet new injuries
were inapplicable. Sea Trespass on the case. Brevia testata
short attested memoranda
uncertainty arisina; from parol feoffments

答案 4 :(得分:0)

我测试了你的代码,但根据你的问题得到了正确的输出

输出:

Breve
Brevia magistralia
chancery to meet new injuries
were inapplicable. Sea Trespass on the case. Brevia testata
short attested memoranda
uncertainty arisina; from parol feoffments
veyances have gradually arisen.

因此请确保您的输入文件本身正确

可能你的测试文件没有新行,即整个文本只写为一行。所以只打印第一个单词,然后找到一个逗号,所以不再单词被打印出来。

注意:最后一句,没有逗号,所以打印出所有单词(与预期输出不同