我有一个regexp和匹配文本文件的问题,我是python的新手。 我的文件看起来像:
epg_slo3.txt:10346224: Service_ID: 1 (0x0001) [= --> refers to PMT program_number]
epg_slo3.txt:10346236: Start_time: 0xdce0112500 [= 2013-09-09 11:25:00 (UTC)]
epg_slo3.txt:10346237: Duration: 0x0001000 [= 00:10:00 (UTC)]
epg_slo3.txt:10346246: event_name: "..©port" -- Charset: ISO/IEC 8859 special table
我需要做什么,我需要这样的事情:
Service_ID: 1 (0x0001) [= --> refers to PMT program_number]: --> Program 1
Start_time: 0xdce0112500 [= 2013-09-09 11:25:00 (UTC)]: --> Start 2013-09-09 11:25:00 (UTC)
Duration: 0x0001000 [= 00:10:00 (UTC)] --> Duration 00:10:00 (UTC)
event_name: "..©port" -- Charset: ISO/IEC 8859 --> Category ©port
我的代码如下:
#!/usr/bin/python
import codecs
import re
BLOCKSIZE = 1048576
with codecs.open('epg_slo10.txt', "r", "iso-8859-2") as sourceFile:
with codecs.open('epg_slo.txt', "w", "utf-8") as targetFile:
while True:
contents = sourceFile.read(BLOCKSIZE)
if not contents:
break
targetFile.write(contents)
input_file = open('epg_slo.txt', "r")
output_file = open('epg_slo_kategorije.txt', "w")
for line in input_file:
line = line.replace("Service_ID:","Program")
line = line.replace("Start_time:","Start")
line = line.replace("event_name:","Title")
output_file.write(line)
你能帮我解决这个问题,
这是读书的好处。 BR!答案 0 :(得分:1)
用空字符串“”
替换下面给出的regex
/^epg_slo3.txt:\d{8}:\s*/
答案 1 :(得分:1)
在代码中line = line.replace
之前,添加以下行:
line = re.sub(r'^epg_slo3.txt:\d{8}:\s*','', line)
例如。
如果
line = "epg_slo3.txt:10346224: Service_ID: 1 (0x0001) [= --> refers to PMT program_number]"
然后调用re.sub
:
line = "Service_ID: 1 (0x0001) [= --> refers to PMT program_number]"