Question

我想将原始文本文件中可以在'startswith'和'endswith'字符串之间标识的部分保存到新的文本文件中。

示例：输入文本文件包含以下几行：

...abc…
...starts with string...
...def...
...ends with string...
...ghi...

...jkl...
...starts with string...
...mno...
...ends with string...
...pqr...

我有兴趣将以下行提取到输出文本文件中：

starts with string...def...ends with string
starts with string...mno...ends with string

我的以下代码返回空列表[]。请帮助纠正我的代码。

with open('file_in.txt','r') as fi:
    id = []
    for ln in fi:
        if ln.startswith("start with string"):
            if ln.endswith("ends with string"):
                id.append(ln[:])
                with open(file_out.txt, 'a', encoding='utf-8') as fo:
                    fo.write (",".join(id))
print(id)

我希望file.out.txt包含所有以“以字符串开头”开头和以“以字符串结尾”结尾的字符串。

Answer 1

startswith和endswith返回True或False，而不是可用于切分字符串的位置。请尝试使用find或index。例如：

start = 'starts with string'
end = 'ends with string'
s = '...abc… ...starts with string... ...def... ...ends with string... ...ghi...'

sub = s[s.find(start):s.find(end) + len(end)]
print(sub)
# starts with string... ...def... ...ends with string

您将需要在循环中添加一些检查以查看开始和结束字符串是否存在，因为如果没有匹配，find将返回-1，这将导致意外切片。

>

Answer 2

您可以使用一个单独的变量来指示当前行是否是感兴趣的部分，并可以基于开始和停止标记来切换此变量。然后，您也可以将此函数转换为生成器：

def extract(fh, start, stop):
    sub = False
    for line in fh:
        sub |= start in line
        if sub:
            yield line
            sub ^= stop in line

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

在Python 3.8中，您可以使用assignment expressions：

import itertools as it

def extract(fh, start, stop):
    while any(start in (line := x) for x in fh):
        yield line
        yield from it.takewhile(lambda x: stop not in x, ((line := y) for y in fh))
        yield line

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

变化：不包括开始和结束标记

如果要从输出中排除开始标记和停止标记，我们可以再次使用itertools.takewhile：

import itertools as it

def extract(fh, start, stop):
    while any(start in x for x in fh):
        yield from it.takewhile(lambda x: stop not in x, fh)

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

Answer 3

每行末尾都有一个字符，告诉计算机显示新行。我在这里假设“以字符串开头”和“以字符串结尾”在同一行上。如果不是这种情况，请在第一个if语句的正下方添加-“ id.append（ln [：]）”-。

尝试

ln.endswith("ends with string"+'\n' )

或

ln.endswith("ends with string"+'\n' +'\r')

with open('C:\\Py\\testing.txt','r') as fi:
    id = []
    x = 0
    copy_line = False
    for ln in fi:
        if "starts with string" in ln:
            copy_line = True
        if copy_line:
            id.append ( ln[:] )
        if "ends with string" in ln :
            copy_line = False

    with open ('C:\\Py\\testing_out.txt', 'a', encoding='utf-8' ) as fo:
        fo.write (",".join(id))

print(id)

如何从“ startswith”到“ endswith”打印字符串的一部分

3 个答案:

变化：不包括开始和结束标记