如何使用关键字搜索句子直到python中的字符串结尾

时间:2018-09-21 09:07:37

标签: python file

我被逻辑上的一小部分所震惊

我的代码在这里

def find_between( string, first, last ):
    list1 = []
    try:
        start = string.index( first ) + len( first )
        end = string.index( last, start )
        list1.append(string[start:end])
        print(list1)
    except ValueError:
        return ""

with open("sample.txt")as f:
    data = f.read()
    print(data)

    find_between( data, "*CHI:  " , "%mor:  " )

我的sample.txt包含:

*CHI:   I saw a giraffe and a elephant .
%mor:   pro:sub|I v|see&PAST det:art|a n|giraffe coord|and det:art|a
    n|elephant .
%gra:   1|2|SUBJ 2|0|ROOT 3|4|DET 4|2|OBJ 5|4|CONJ 6|7|DET 7|5|COORD 8|2|PUNCT
*CHI:   <that> [/] (.) that (i)s it . [+ bch]
%mor:   pro:dem|that cop|be&3S pro:per|it .
%gra:   1|2|SUBJ 2|0|ROOT 3|2|PRED 4|2|PUNCT
*CHI:   I saw an elephant go swimming .
%mor:   pro:sub|I v|see&PAST det:art|a n|elephant v|go part|swim-PRESP .
%gra:   1|2|SUBJ 2|0|ROOT 3|4|DET 4|5|SUBJ 5|2|COMP 6|5|OBJ 7|2|PUNCT
*CHI:   <I saw eleph> [//] I saw the <g> [/] giraffe and the elephant <s>
    [//] drop ball in the pool .
%mor:   pro:sub|I v|see&PAST det:art|the n|giraffe coord|and det:art|the
    n|elephant n|drop n|ball prep|in det:art|the n|pool .
%gra:   1|2|SUBJ 2|0|ROOT 3|4|DET 4|2|OBJ 5|4|CONJ 6|9|DET 7|9|MOD 8|9|MOD
    9|5|COORD 10|9|NJCT 11|12|DET 12|10|POBJ 13|2|PUNCT
*CHI:   I saw giraffe swimming in the pool to get that ball .
%mor:   pro:sub|I v|see&PAST n|giraffe part|swim-PRESP prep|in det:art|the
    n|pool inf|to v|get pro:dem|that n|ball .

我应该返回“ * CHI:”和“%mor:”之间的所有句子 我的代码只带了第一行

I saw a giraffe and a elephant

通过迭代到字符串末尾来帮助我,我应该能够打印出所有在“ * CHI:”和“%mor:”之间的句子。

2 个答案:

答案 0 :(得分:1)

因此,我使用了正则表达式和简单字符串而不是文件,但这是相同的原理。检查工作代码:

import re

s = """
    *CHI:   I saw a giraffe and a elephant .
    %mor:   pro:sub|I v|see&PAST det:art|a n|giraffe coord|and det:art|a 
        n|elephant .
    %gra:   1|2|SUBJ 2|0|ROOT 3|4|DET 4|2|OBJ 5|4|CONJ 6|7|DET 7|5|COORD 8|2|PUNCT
    *CHI:   <that> [/] (.) that (i)s it . [+ bch]
    %mor:   pro:dem|that cop|be&3S pro:per|it .
    %gra:   1|2|SUBJ 2|0|ROOT 3|2|PRED 4|2|PUNCT
    *CHI:   I saw an elephant go swimming .
    %mor:   pro:sub|I v|see&PAST det:art|a n|elephant v|go part|swim-PRESP .
    %gra:   1|2|SUBJ 2|0|ROOT 3|4|DET 4|5|SUBJ 5|2|COMP 6|5|OBJ 7|2|PUNCT
    *CHI:   <I saw eleph> [//] I saw the <g> [/] giraffe and the elephant <s>
        [//] drop ball in the pool .
    %mor:   pro:sub|I v|see&PAST det:art|the n|giraffe coord|and det:art|the
        n|elephant n|drop n|ball prep|in det:art|the n|pool .
    %gra:   1|2|SUBJ 2|0|ROOT 3|4|DET 4|2|OBJ 5|4|CONJ 6|9|DET 7|9|MOD 8|9|MOD
        9|5|COORD 10|9|NJCT 11|12|DET 12|10|POBJ 13|2|PUNCT
    *CHI:   I saw giraffe swimming in the pool to get that ball .
    %mor:   pro:sub|I v|see&PAST n|giraffe part|swim-PRESP prep|in det:art|the
        n|pool inf|to v|get pro:dem|that n|ball .
    """

result = re.findall('(?<=CHI:)(.*?)(?=%mor)', s, flags=re.S)
print(result)

答案 1 :(得分:0)

试图在不使用正则表达式的情况下执行此操作,尽管我仍不确定是否允许%gra在两者之间打印。

def find_between( string, first, last ):
    flag = True
    try:
        buffer = ""
        op_string = ""
        for line in string:
            if first in line:
                buffer += line
                flag = True

            elif last in line:
                op_string += buffer 
                buffer = "" # flush buffer
                flag = False

            elif flag is True:
                buffer += line

        print(op_string)

    except ValueError:
        return ""

with open("sample.txt")as f:
    data = f.readlines()
    #print(data)

    find_between( data, "*CHI:  " , "%mor:  " )