我有一种情况,可以通过找到特定的字符串来删除一行中的先前文本。
我的文件很大,希望删除一些不需要的文本。
例如:我的一行如下:
&$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
He / [A j } . D V Fd Y $GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64
在这里,我需要找到一个字符串$G
并删除其后面不需要的字符。我需要一个这样的文件。
$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
$GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64
有人可以帮助我使用python脚本吗?
答案 0 :(得分:0)
您可以使用re - module来完成此任务:
# create demo file
t = """&$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
He / [A j } . D V Fd Y $GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64"""
with open("f.txt" ,"w") as f:
f.write(t)
# process demo file
import re
cleaned = []
r = r"^.*?(\$G.*)$"
with open ("f.txt") as f, open ("r.txt","w") as w:
for l in f:
m = re.search(r,l)
if m:
w.write(m.group(1).rstrip("\n")+"\n")
with open ("r.txt") as r:
print(r.read())
输出文件:
$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
$GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64
以$G
开头的一行中所有匹配项的正则表达式searches,直到该行的末尾。如果找到match,请将其写入新文件。
regexstring ^.*?(\$G.*)$
的意思是:
^ start of line
.*? as few anythings as possible
( start of captured group
\$G literal $ followed by G
.* anything greedy
) end of captured group
$ end of line
您可能需要在最后一行之后添加crlf或集成\ Z。
使用您的真实数据和f.e可能更好。 http://regex101.com