基于某些字符在python中拆分行

时间:2013-01-18 20:58:58

标签: python

输入:

!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.

输出:

!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.

'!'是起始字符,+ 0013应该是每行的结尾(如果存在)。

我得到的问题: 输出如下:

!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W

任何帮助都将受到高度赞赏...... !!!

我的代码:

file_open= open('sample.txt','r') 
file_read= file_open.read() 
file_open2= open('output.txt','w+') 
counter =0 
for i in file_read: 
    if '!' in i: 
        if counter == 1: 
            file_open2.write('\n') 
            counter= counter -1 
        counter= counter +1 
    file_open2.write(i)

6 个答案:

答案 0 :(得分:2)

您可以尝试这样的事情:

with open("abc.txt") as f:
    data=f.read().replace("\r\n","")  #replace the newlines with ""

    #the newline can be "\n" in your system instead of "\r\n"

    ans=filter(None,data.split("!"))  #split the data at '!', then filter out empty lines
    for x in ans:
        print "!"+x    #or write to some other file
   .....:         
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.

答案 1 :(得分:1)

你能使用str.split吗?

lines = file_read.split('!')

现在,lines是一个包含拆分数据的列表。这几乎就是你想要写的行 - 唯一的区别是它们没有尾随换行符,并且它们在开始时没有'!'。我们可以使用字符串格式轻松地将它们放入 - 例如'!{0}\n'.format(line)。然后我们可以将整个事物放在生成器表达式中,我们将传递给file.writelines以将数据放入新文件中:

file_open2.writelines('!{0}\n'.format(line) for line in lines)

您可能需要:

file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)

如果您发现在输出中获得的线条比您想要的更多。

其他几点,打开文件时,使用上下文管理器很好 - 这可以确保文件正确关闭:

with open('inputfile') as fin:
    lines = fin.read()
with open('outputfile','w') as fout:
    fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)

答案 2 :(得分:1)

另一种选择,使用replace而不是拆分,因为您知道每行的起始和结束字符:

In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '')

In [15]: print data.replace('+0013!', "+0013\n!")
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.

答案 3 :(得分:1)

只是为了一些变化,这是一个正则表达式答案:

import re

outputFile = open('output.txt', 'w+') 
with open('sample.txt', 'r') as f: 
    for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL): 
        outputFile.write(line.replace("\n", "") + '\n') 

outputFile.close() 

它将打开输出文件,获取输入文件的内容,并使用带有re.DOTALL标志的正则表达式!.+?(?=!|$)循环遍历所有匹配项。正则表达式解释&匹配的内容可以在这里找到:http://regex101.com/r/aK6aV4

在我们匹配之后,我们从匹配中删除新行,并将其写入文件。

答案 4 :(得分:0)

让我们尝试在每个“!”之前添加\n;然后让python splitlines :-):

file_read.replace("!", "!\n").splitlines()

答案 5 :(得分:0)

我实际上将实现为生成器,以便您可以处理数据流而不是文件的整个内容。如果使用大文件

,这将非常友好
>>> def split_on_stream(it,sep="!"):
    prev = ""
    for line in it:
        line = (prev + line.strip()).split(sep)
        for parts in line[:-1]:
            yield parts
        prev = line[-1]
    yield prev


>>> with open("test.txt") as fin:
    for parts in split_on_stream(fin):
        print parts



,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:19,000.0,0,37N22.