在第一个空格和问号之间刮取文本

时间:2014-12-03 13:40:20

标签: python regex

我的数据格式如下:

Alessandro_Volta    Was Alessandro Volta a professor of chemistry?  Alessandro Volta was not a professor of chemistry.  easy    easy    data/set4/a10
Alessandro_Volta    Was Alessandro Volta a professor of chemistry?  No  easy    hard    data/set4/a10
Alessandro_Volta    Did Alessandro Volta invent the remotely operated pistol?   Alessandro Volta did invent the remotely operated pistol.   easy    easy    data/set4/a10
Alessandro_Volta    Did Alessandro Volta invent the remotely operated pistol?   Yes easy    easy    data/set4/a10
Alessandro_Volta    Was Alessandro Volta taught in public schools?  Volta was taught in public schools. easy    easy    data/set4/a10
Alessandro_Volta    Was Alessandro Volta taught in public schools?  Yes easy    easy    data/set4/a10

我想废除question。即first \t?之间的文字(我想到了这个解决方案,不知道是否更好)

导入重新

def f(regexStr,target):
    mo = re.search(regexStr,target)
    if not mo:
        print "NO MATCH"
    else:
        print "MATCH:",mo.group()

f(r"\^[^~]*~","{Mat^chThisT~ext}") 

此代码正确地在^~之间提供了文字,但我在\t?尝试了同样的文字,它给了NO MATCH

2 个答案:

答案 0 :(得分:3)

如果输入格式一致,那么为什么不是简单的:

with open('input.txt') as input_file:
    questions = [line.split('\t', 2)[1].strip() for line in input_file]

假设input.txt文件中每行的问题部分始终以tab字符开头,后面跟questions字符,{{1}}将包含由问题组成的字符串列表。

答案 1 :(得分:1)

(?<=[ ]{4,}).*?\?

试试这个。看看演示。

http://regex101.com/r/yR3mM3/36

import re
p = re.compile(r'(?<=[ ]{4,}).*?\?')
test_str = "Alessandro_Volta Was Alessandro Volta a professor of chemistry? Alessandro Volta was not a professor of chemistry. easy easy data/set4/a10\nAlessandro_Volta Was Alessandro Volta a professor of chemistry? No easy hard data/set4/a10\nAlessandro_Volta Did Alessandro Volta invent the remotely operated pistol? Alessandro Volta did invent the remotely operated pistol. easy easy data/set4/a10\nAlessandro_Volta Did Alessandro Volta invent the remotely operated pistol? Yes easy easy data/set4/a10\nAlessandro_Volta Was Alessandro Volta taught in public schools? Volta was taught in public schools. easy easy data/set4/a10\nAlessandro_Volta Was Alessandro Volta taught in public schools? Yes easy easy data/set4/a10"

re.findall(p, test_str)