如何使用python查找字符串并删除以前的文本?

时间:2018-11-07 16:07:54

标签: python regex

我有一种情况,可以通过找到特定的字符串来删除一行中的先前文本。

我的文件很大,希望删除一些不需要的文本。

例如:我的一行如下:

&$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
 He    /  [A j  }    .   D   V   Fd     Y       $GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64

在这里,我需要找到一个字符串$G并删除其后面不需要的字符。我需要一个这样的文件。

$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
$GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64

有人可以帮助我使用python脚本吗?

1 个答案:

答案 0 :(得分:0)

您可以使用re - module来完成此任务:

# create demo file
t = """&$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
 He    /  [A j  }    .   D   V   Fd     Y       $GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64"""

with open("f.txt" ,"w") as f:
    f.write(t)


# process demo file
import re

cleaned = []
r = r"^.*?(\$G.*)$"
with open ("f.txt") as f, open ("r.txt","w") as w:
    for l in f:
        m = re.search(r,l)
        if m:
            w.write(m.group(1).rstrip("\n")+"\n")

with open ("r.txt") as r:
    print(r.read())

输出文件:

$GNDTM,W84,,0.0,N,0.0,E,0.0,W84*71
$GLGSV,4,1,13,65,02,318,26,70,06,099,28,71,30,054,35,72,26,356,32*64

$G开头的一行中所有匹配项的正则表达式searches,直到该行的末尾。如果找到match,请将其写入新文件。

regexstring ^.*?(\$G.*)$的意思是:

^   start of line  
  .*? as few anythings as possible
    ( start of captured group
      \$G  literal $ followed by G
      .* anything greedy
    ) end of captured group
$ end of line

您可能需要在最后一行之后添加crlf或集成\ Z。

使用您的真实数据和f.e可能更好。 http://regex101.com