txt文件中的Python正则表达式

时间:2018-08-06 15:44:12

标签: python regex

我有一个文本文件,里面有数据。

PAS_BEGIN_3600000
    CMD_VERS=2
    CMD_TRNS=O
    CMD_REINIT=
    CMD_OLIVIER=

我想从该文件中提取数据,等号后没有任何内容。

所以在我的新文本文件中,我想得到

CMD_REINIT
CMD_OLIVIER

我该怎么做?


我的代码现在就像这样。

import os, os.path

DIR_DAT = "dat"
DIR_OUTPUT = "output"

print("Psst go check in the ouptut folder ;)")
for roots, dir, files in os.walk(DIR_DAT):  
    for filename in files:
        filename_output = "/" + os.path.splitext(filename)[0]   
        with open(DIR_DAT + "/" + filename) as infile, open(DIR_OUTPUT + "/bonjour.txt", "w") as outfile:
            for line in infile:
                if not line.strip().split("=")[-1]:
                    outfile.write(line)

我想将所有数据收集在一个文件中。没用谁能帮我吗?

第三步,它确实对新文件进行爬网,并且仅保留单个值。将四个文件附加到一个文件中。某些数据可能会出现四,三,两次。

我需要保存一个新文件,我将其称为output.txt。仅所有文件中共有的行。

4 个答案:

答案 0 :(得分:3)

您可以使用正则表达式:

import re

data = """PAS_BEGIN_3600000
    CMD_VERS=2
    CMD_TRNS=O
    CMD_REINIT=
    CMD_OLIVIER="""

found = re.findall(r"^\s+(.*)=\s*$",data,re.M)

print( found )

输出:

['CMD_REINIT', 'CMD_OLIVIER']

表达式查找

^\s+  line start + whitespaces
(.*)=  anything before a =  which is caputred as group
\s*$   followed by optional whitespaces and line end

使用re.M (multiline)标志。

按如下方式读取文件文本:

with open("yourfile.txt","r") as f:
    data = f.read()

这样写新文件:

with open("newfile.txt","w") as f:
    f.write(''.join("\n",found))

您可以使用http://www.regex101.com来评估测试文本和regex模式,并确保切换到其python模式。

答案 1 :(得分:3)

我建议您使用以下简短的理解方法:

with open('file.txt', 'r') as f, open('newfile.txt', 'w') as newf:
    for x in (line.strip()[:-1] for line in f if line.strip().endswith("=")):
        newf.write(f'{x}\n')

答案 2 :(得分:2)

尝试以下模式:\w+(?==$)

Demo

答案 3 :(得分:1)

使用简单的迭代。

例如:

with open(filename) as infile, open(filename2, "w") as outfile:
    for line in infile:                          #Iterate Each line
        if not line.strip().split("=")[-1]:      #Check for second Val
            print(line.strip().strip("="))
            outfile.write(line)                  #Write to new file

输出:

CMD_REINIT
CMD_OLIVIER