我有一个文本文件,里面有数据。
PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER=
我想从该文件中提取数据,等号后没有任何内容。
所以在我的新文本文件中,我想得到
CMD_REINIT
CMD_OLIVIER
我该怎么做?
我的代码现在就像这样。
import os, os.path
DIR_DAT = "dat"
DIR_OUTPUT = "output"
print("Psst go check in the ouptut folder ;)")
for roots, dir, files in os.walk(DIR_DAT):
for filename in files:
filename_output = "/" + os.path.splitext(filename)[0]
with open(DIR_DAT + "/" + filename) as infile, open(DIR_OUTPUT + "/bonjour.txt", "w") as outfile:
for line in infile:
if not line.strip().split("=")[-1]:
outfile.write(line)
第三步,它确实对新文件进行爬网,并且仅保留单个值。将四个文件附加到一个文件中。某些数据可能会出现四,三,两次。
我需要保存一个新文件,我将其称为output.txt。仅所有文件中共有的行。
答案 0 :(得分:3)
您可以使用正则表达式:
import re
data = """PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER="""
found = re.findall(r"^\s+(.*)=\s*$",data,re.M)
print( found )
输出:
['CMD_REINIT', 'CMD_OLIVIER']
表达式查找
^\s+ line start + whitespaces
(.*)= anything before a = which is caputred as group
\s*$ followed by optional whitespaces and line end
使用re.M (multiline)标志。
按如下方式读取文件文本:
with open("yourfile.txt","r") as f:
data = f.read()
这样写新文件:
with open("newfile.txt","w") as f:
f.write(''.join("\n",found))
您可以使用http://www.regex101.com来评估测试文本和regex模式,并确保切换到其python模式。
答案 1 :(得分:3)
我建议您使用以下简短的理解方法:
with open('file.txt', 'r') as f, open('newfile.txt', 'w') as newf:
for x in (line.strip()[:-1] for line in f if line.strip().endswith("=")):
newf.write(f'{x}\n')
答案 2 :(得分:2)
尝试以下模式:\w+(?==$)
。
答案 3 :(得分:1)
使用简单的迭代。
例如:
with open(filename) as infile, open(filename2, "w") as outfile:
for line in infile: #Iterate Each line
if not line.strip().split("=")[-1]: #Check for second Val
print(line.strip().strip("="))
outfile.write(line) #Write to new file
输出:
CMD_REINIT
CMD_OLIVIER