我有这样的数据:
MP|3561042|||WQTI544|BEA148|16077: POWER ID|7817|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16011: BINGHAM ID|45607|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16005: BANNOCK ID|82839|I|103306|||D|1
MP|3561250|||WQTI576
|BEA135|48301: LOVING TX|82|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48443: TERRELL TX|984|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48173: GLASSCOCK TX|1226|I|103308|||D|1
我如何实现这一目标:
MP|3561042|||WQTI544|BEA148|16077: POWER ID|7817|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16011: BINGHAM ID|45607|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16005: BANNOCK ID|82839|I|103306|||D|1
MP|3561250|||WQTI575|BEA135|48301: LOVING TX|82|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48443: TERRELL TX|984|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48173: GLASSCOCK TX|1226|I|103308|||D|1
我尝试过:
f=open('C:/Users/user/Desktop/a.csv','r')
lines=f.readlines()
mystr = '|'.join([line.strip() for line in lines])
print(mystr)
MP|3561042|||WQTI544|BEA148|16077: POWER,
ID|7817|I|103306|||D|1|MP|3561042|||WQTI544|BEA148|16011: BINGHAM,
ID|45607|I|103306|||D|1|MP|3561042|||WQTI544|BEA148|16005: BANNOCK,
ID|82839|I|103306|||D|1|MP|3561250|||WQTI576|||BEA135|48301: LOVING,
TX|82|I|103308|||D|1|MP|3561250|||WQTI576|||BEA135|48443: TERRELL,
TX|984|I|103308|||D|1|MP|3561250|||WQTI576|||BEA135|48173: GLASSCOCK,
TX|1226|I|103308|||D|1|MP|3561250|||WQTI576|
我没有达到我想要的方式,请帮忙吗?第一列始终具有MP数据,每行有13个管道作为分隔符。
编辑:
如何用找到的'MP'而不是'D | 1'做同样的事情,下面是我尝试过的方法,但是没有给出正确的事情,因为有些行没有'D | 1 ”,并且结尾处带有“ U | 1234”
content = ([l.strip().decode('utf-8') for l in s1 if l.strip()])
for line in content:
find_START = line.find('MP') # check if line has D|1
if find_START ==0:
tmp += line
res.append(tmp)
tmp = ''
else:
tmp += line
for r in res:
print(r)
其打印如下:
MP|3561042|||WQTI544|BEA148|16011: BINGHAM, ID|45607|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16005: BANNOCK, ID|82839|I|103306|||D|1
MP|3561250|||WQTI576
|BEA135|48301: LOVING, TX|82|I|103308|||D|1MP|3561250|||WQTI576
|BEA135|48443: TERRELL, TX|984|I|103308|||D|1MP|3561250|||WQTI576
|BEA135|48173: GLASSCOCK, TX|1226|I|103308|||D|1MP|3561250|||WQTI576
答案 0 :(得分:1)
日志文件:
MP|3561042|||WQTI544|BEA148|16077: POWER ID|7817|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16011: BINGHAM ID|45607|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16005: BANNOCK ID|82839|I|103306|||D|1
MP|3561250|||WQTI576
|BEA135|48301: LOVING TX|82|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48443: TERRELL TX|984|I|103308|||D|1
MP|3561250|||WQTI576
|BEA135|48173: GLASSCOCK TX|1226|I|103308|||D|1
因此:
res = [] # empty list to store the results
tmp = '' # empty string for unindented lines
with open(logFile) as f:
content = f.readlines()
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]
for line in content:
find_END = line.find('D|1') # check if line has D|1
if find_END > 0:
tmp += line
res.append(tmp)
tmp = ''
else:
tmp += line
for r in res: print(r)
输出:
MP|3561042|||WQTI544|BEA148|16077: POWER ID|7817|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16011: BINGHAM ID|45607|I|103306|||D|1
MP|3561042|||WQTI544|BEA148|16005: BANNOCK ID|82839|I|103306|||D|1
MP|3561250|||WQTI576|BEA135|48301: LOVING TX|82|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48443: TERRELL TX|984|I|103308|||D|1
MP|3561250|||WQTI576|BEA135|48173: GLASSCOCK TX|1226|I|103308|||D|1