我有这样的数据,它被行(日期时间行)分隔
01-Jan-1990 00:00:01 ABCD
A abcde fghijk lmnopq
hsjfne qqq # EDITED WITH ADDITONAL SPILL OVER DATA with \t
B abcde fghijk lmnopq
01-Jan-1990 00:00:05 ABCD
A ancfjhr sfjerhj egen
C etfhw3uh uhuefwh fewvjh dfeg efwbywgefb
D wrf fcwewe fvwefwe fwef
01-Jan-1990 00:00:07 ABCD
A wfw fbebwu
B fewhuf ifgiwejhifgj fijweij
希望以一种方式将其清洗,如日期时间行之后的第一个值中的A,B,C等分隔为一列,而将A,B,C之后的值分隔为另一列然后捕获日期时间并将其输入为另一列。像这样
A,abcde fghijk lmnopq hsjfne qqq, 01-Jan-1990 00:00:01 #WOULD LIKE TO COMBINE THE SPILL DATA
B,abcde fghijk lmnopq, 01-Jan-1990 00:00:01
A,ancfjhr sfjerhj egen,01-Jan-1990 00:00:05
C,etfhw3uh uhuefwh fewvjh dfeg efwbywgefb,01-Jan-1990 00:00:05
D,wrf fcwewe fvwefwe fwefe,01-Jan-1990 00:00:05
etc etc etc
如果有人可以指导我,将不胜感激。我尝试通过模式匹配来阅读,然后抓住以下几行,但无法完成。
import re
#Log Reading
log=open("IDM.txt","r")
for line in log:
splitLine = line.split()
iterator = iter(splitLine)
datematch = (re.match('^(([0-9])|([0-2][0-9])|([3][0-1])-
(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-\\d{4}$',splitLine[0]))
if datematch:
print(line)
理解上面的代码与我想要实现的代码完全不同,因此希望你们能帮助我指导并证明我已经尝试了一些东西。谢谢您的时间
已编辑:包括第三行数据,以显示第二行数据的溢出值,并在行前使用\ t制表符
答案 0 :(得分:1)
使用with open()
打开文件始终是一个好主意,然后您可以根据需要在列表中解析行,在我的情况下,我只是检查了行的前2个字符是数字(如果是),它存储以后要添加到所需行的值:
import csv
content = []
with open('IDM.txt','r') as f:
lines = f.readlines()
for idx,line in enumerate(lines):
if line[:2].isdigit():
date = line[:20]
elif idx == len(lines)-1 or (line[0] != ' ' and lines[idx+1][0] != ' '):
data = line[0] + ',' + line[1:].rstrip('\n')
content.append(data+ ', '+ date)
elif lines[idx+1][0] == ' ':
spill = lines[idx+1].rstrip('\n').strip()
data = line[0] + ',' + line[1:].rstrip('\n') + ' ' + spill
content.append(data+ ', '+ date)
else:
pass
with open('IDMOutput.csv','w') as f:
for line in content:
f.write("%s\n" % line)
>>content
['A, abcde fghijk lmnopq hsjfne qqqqq, 01-Jan-1990 00:00:01',
'B, abcde fghijk lmnopq, 01-Jan-1990 00:00:01',
'A, ancfjhr sfjerhj egen, 01-Jan-1990 00:00:05',
'C, etfhw3uh uhuefwh fewvjh dfeg efwbywgefb, 01-Jan-1990 00:00:05',
'D, wrf fcwewe fvwefwe fwef, 01-Jan-1990 00:00:05',
'A, wfw fbebwu, 01-Jan-1990 00:00:07',
'B, fewhuf ifgiwejhifgj fijweij, 01-Jan-1990 00:00:07']
编辑:添加了rstrip
以删除'\n'
并包含timestamp
并溢出与输出相关的更新。
答案 1 :(得分:0)
另一种简单的方法是使用正则表达式:Regular Expression HOWTO和Print lists in Python
.txt
文件IDM.txt
lstrip()
删除了左侧的空白pattern_num
来查找以数字开头的匹配行log
字符串IDM_clean.txt
更新:最终解决方案为
Generalization
:
import re
pattern_num = re.compile(r'^[0-9]') # patter we look in the string
log_list = []
#for line in file_as_list:
file_as_list = []
lines = open("IDM.txt", "r").read().split("\n")
for i, line in enumerate(lines):
if line.startswith(" "):
lines[i-1] = lines[+1].strip() + " " + line.lstrip()
lines.pop(i)
logs = '\n'.join(lines)+"\n"
file_as_list = logs.splitlines()
for l in file_as_list:
if re.match(pattern_num, l):
datos = l
else:
info = l[0] + ', ' + l[1:].lstrip()
log_list.append(info + ', ' + datos)
log = '\n'.join(map(str, log_list))
open("IDM_clean.txt", "w").write(log+"\n") # write to the file the result
print("-----------------------------------")
print(type(log))
print("------------------------------------------------------------------------")
print(log)#print the desired format
print("------------------------------------------------------------------------")
Out: ---------------------------------- <class 'str'> ----------------------------------------------------------------------- A, abcde fghijk lmnopq hsjfne qqq, 01-Jan-1990 00:00:01 ABCD B, abcde fghijk lmnopq, 01-Jan-1990 00:00:01 ABCD A, ancfjhr sfjerhj egen, 01-Jan-1990 00:00:05 ABCD C, etfhw3uh uhuefwh fewvjh dfeg efwbywgefb, 01-Jan-1990 00:00:05 ABCD D, wrf fcwewe fvwefwe fwef, 01-Jan-1990 00:00:05 ABCD A, wfw fbebwu, 01-Jan-1990 00:00:07 ABCD B, fewhuf ifgiwejhifgj fijweij, 01-Jan-1990 00:00:07 ABCD -----------------------------------------------------------------------
文件中的屏幕:
A, abcde fghijk lmnopq hsjfne qqq, 01-Jan-1990 00:00:01 ABCD
B, abcde fghijk lmnopq, 01-Jan-1990 00:00:01 ABCD
A, ancfjhr sfjerhj egen, 01-Jan-1990 00:00:05 ABCD
C, etfhw3uh uhuefwh fewvjh dfeg efwbywgefb, 01-Jan-1990 00:00:05 ABCD
D, wrf fcwewe fvwefwe fwef, 01-Jan-1990 00:00:05 ABCD
A, wfw fbebwu, 01-Jan-1990 00:00:07 ABCD
B, fewhuf ifgiwejhifgj fijweij, 01-Jan-1990 00:00:07 ABCD