Python使用特殊字符逐行解析文件

时间:2015-07-29 11:07:57

标签: python parsing

使用经典for line in filename:方法逐行读取文件时,如何根据特定字符符号将每行连接成一个字符串(或每个列表一个字符串)(例如{{ 1}})。例如:

我的意见:

$

我想要的输出:

$asdfasdfasdfasdfasdfasdf
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
$aWEOUUEWOEUowuerotueworutowueortuo
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs

OR

'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF'
'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJSLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs'

请注意,以['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF'] ['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJSLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs'] 符号开头的所有行都被删除,并逐行用作字符串连接的断点。

2 个答案:

答案 0 :(得分:1)

您可以使用正则表达式。re.finditer将返回包含所有所需行的iterator,然后您可以使用列表解析和str.replace方法将换行符替换为空字符串:

>>> s="""$asdfasdfasdfasdfasdfasdf
... ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
... LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
... LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
... ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
... $aWEOUUEWOEUowuerotueworutowueortuo
... ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
... LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
... LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
... ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs
... """
>>> 
>>> import re
>>> 
>>> li=re.finditer(r'\$[^\n]*([^$]+)',s)
>>> [i.group(1).replace('\n','') for i in li]
['ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF',
 'ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSDLKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFASLLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKSALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs']

答案 1 :(得分:1)

import io

data = io.StringIO('''$asdfasdfasdfasdfasdfasdf
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALSJDFLJALSJDFLASDLFJLAJSDLFJALSDFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJF
$aWEOUUEWOEUowuerotueworutowueortuo
ABCSLKDJFLAJSDJLAJSDLFJALJSDLKJLAJLSKDJFLAJSD
LKAJSDLJFALoqiweoituoiwueoruweuroouqweoruuqowuieoFAS
LLASJLKDFJLASDFASKLDFLASDFJALSDJFLAJSDLFJALKS
ALKSDJLFKJASLDJFLAJSDLFJALSJDFLJASLDJJASDLFJFsdfs''')


strings = []
strg = ''
for line in data:
    if line.startswith('$'):
        if strg:
            strings.append(strg)
            strg = ''
        continue
    else:
        strg += line.strip()
if strg:
    strings.append(strg)   

print(strings)