从文本文件中读取而不必使用readline()

时间:2014-05-27 08:39:30

标签: python text-files

这是我拥有的文本文件的一部分

Participant: Interviewer 
Translation: <english>Mhlongo.</english> Okay Monde, what languages do you typically use with your family and why? 
        :  
Participant: Participant 
Translation: Okay <english>it was Zulu, eh and Sotho, eh:</english> my mom is Sotho and my father is Zulu so we her language most of the time. 
        :  
Participant: Interviewer 
Translation: Mh, and so <english>you speak</english> <english>you speak</english>. What languages or language do you use with friends and why? 
        :  
Participant: Participant 
Translation:  Eh, isiZulu. 

我正在尝试迭代以获得参与者和面试官的翻译。这是我的代码。

while True:
    interviewer = f.readline()
    interviewer_translation = f.readline()
    participant = f.readline()
    participant_translation = f.readline()
    ...
    if not participant_translation: break 

但是,上面的代码试图逐行获取它,但这不起作用,因为翻译有时需要几行或更多行。有没有办法可以在不使用readline的情况下完成它?

2 个答案:

答案 0 :(得分:1)

您可以使用f.readline()逐行读取并连接到记录分隔符,然后处理连接的块,例如:

def process(participant, translation):
    pass

participant = None
translation = ''
for line in f:
    if line.startswith('Participant: '):
        if participant:
            process(participant, translation)
        participant = line
        translation = ''
    elif participant and line.startswith('Translation: '):
        translation += line
process(participant, translation)

如果f.read(size)参数被忽略,您可以使用size函数读取文件或整个文件的更大块:

>>> f.read()
'This is the entire file.\n'

然后你可以使用multilne regex从中获取有意义的文本块,例如整个记录:

>>> re.findall('(?P<record>^Participant:.*?)(?=(?:Participant:|\Z))', text, re.S | re.M)
['Participant: Interviewer\nTranslation: <english>Mhlongo.</english> Okay Monde, what languages do you typically use with your family and why?\n        :\n', 'Participant: Participant\nTranslation: Okay <english>it was Zulu, eh and Sotho, eh:</english> my mom is Sotho and my father is Zulu so we her language most of the time.\n        :\n', 'Participant: Interviewer\nTranslation: Mh, and so <english>you speak</english> <english>you speak</english>. What languages or language do you use with friends and why?\n        :\n', 'Participant: Participant\nTranslation:  Eh, isiZulu.\n']

对你来说感觉更舒服。请注意一次读取大文件,因为它们可能不适合可用内存。

答案 1 :(得分:0)

如果参与者和采访者行总是只占一行并且总是看起来相同,那么你可以使用类似的东西:

p_translation = ""
i_translation = ""
interviewer = False
for line in f:
    if line.startsWith("Participant: Participant"):
        #This would be the place to process i_translation
        #because now the translation of the interviewer was
        #fully read
        interviewer = False
        p_translation = ""
    elif line.startsWith("Participant: Interviewer"):
        #This would be the place to process p_translation
        #because now the translation of the participant was
        #fully read
        interviewer = True
        i_translation = ""
    else:
        if interviewer:
            i_translation += line
        else:
            p_translation += line