我正在寻找一种打开文件的简单方法,并搜索每一行以查看该行是否有未闭合的parens和引号。如果该行具有未闭合的parens / quotes,我想将该行打印到文件中。我知道我可以用一个丑陋的if / for语句来做这件事,但我知道python可能有更好的方法与re模块(我什么都不知道)或其他什么但我不知道语言足够好这样做。
谢谢!
编辑:一些示例行。如果将其复制到记事本或其他内容并关闭自动换行(某些行可能很长),可能更容易阅读。此外,文件中有超过100k行,所以效果会很棒!
SL ID=0X14429A0B TY=STANDARD OWN=0X429A03 EXT=22 SLTK=0X1C429A0B MP=0X684003F0 SUB=0X24400007
RT ID=0X18429A19 TY=CALONSC OWN=0X14429A0B EXLP=0X14429A0C CMDS=(N:0X8429A04,C:0X14429A0B) SGCC=2 REL=1 DESC="AURANT YD-INDSTRY LD" ATIS=T
RT ID=0X18429A1A TY=CALONSC OWN=0X14429A0B EXLP=0X14429A08 CMDS=(R:0X8429A04,N:0X8429A05,C:0X14429A0B) SGCC=2 REL=2 DESC="AURANT YD TO TRK.1" ATIS=T
RT ID=0X18429A1B TY=CALONSC OWN=0X14429A0B EXLP=0X14429A0A CMDS=(R:0X8429A04,R:0X8429A05,C:0X14429A0B) SGCC=2 REL=3 DESC="AURANT YD TO TRK.2" ATIS=T
SL ID=0X14429A0C TY=STANDARD OWN=0X429A03 EXT=24 SLTK=0X1C429A0B MP=0X684003F1 SUB=0X24400007
RT ID=0X18429A1C TY=CALONSC OWN=0X14429A0C EXLP=0X14429A0B CMDS=(N:0X8429A04,C:0X14429A0C) SGCC=2 REL=1 DESC="AURANT YD-INDSTRY LD" ATIS=T
TK ID=0X1C429A08 TY=BLKTK OWN=0X429A03 EXT=12 LRMP=0X6C40BDAF LEN=5837 FSPD=60 PSPD=65 QUAL=TRK.1 MAXGE=0 MAXGW=0 JAL=4 ALT=12 SUB=0X24400007 RULES=(CTC:B:UP:0X24400007:485.7305:486.8359:T) LLON=-118.1766772 RLON=-118.1620059 LLAT=34.06838375 RLAT=34.07811764 LELE=416.6983 RELE=425.0596 ULAD=NO URAD=NO
PT ID=0X20429A0F TY=STANDARD OWN=0X1C429A08 LTK=0X1C40006C RTK=0X1C429A0C REL=1 LEN=1 LQUAL="TRK.1" RQUAL="TRK.1"
PTK OWN=0X1C429A08 PID=0X1C429A13
答案 0 :(得分:6)
如果你不认为会有倒退的不匹配的parens(即“)”,你可以这样做:
with open("myFile.txt","r") as readfile, open("outFile.txt","w") as outfile:
for line in readfile:
if line.count("(") != line.count(")") or line.count('"') % 2 != 0:
outfile.write(line)
否则你必须一次计算一次以检查是否存在不匹配,例如:
with open("myFile.txt","r") as readfile, open("outFile.txt","w") as outfile:
for line in readfile:
count = 0
for char in line:
if char == ")":
count -= 1
elif char == "(":
count += 1
if count < 0:
break
if count != 0 or text.count('"') % 2 != 0:
outfile.write(line)
我想不出更好的办法来处理它。 Python不支持递归正则表达式,所以正则表达式解决方案就出来了。
关于这一点还有一件事:给定你的数据,最好将它放入一个函数并拆分你的字符串,这很容易用正则表达式,如下所示:
import re
splitre = re.compile(".*?=(.*?)(?:(?=\s*?\S*?=)|(?=\s*$))")
with open("myFile.txt","r") as readfile, open("outFile.txt","w") as outfile:
for line in readfile:
def matchParens(text):
count = 0
for char in text:
if char == ")":
count -= 1
elif char == "(":
count += 1
if count < 0:
break
return count != 0 or text.count('"') % 2 != 0
if any(matchParens(text) for text in splitre.findall(line)):
outfile.write(line)
可能更好的原因是它会单独检查每个值对,如果你在一个值对中有一个开放的paren而在后一个值中有一个close paren,那么它不会认为没有不平衡括号。
答案 1 :(得分:5)
使用解析器包似乎有些过分,但它很快:
text = """\
SL ID=0X14429A0B TY=STANDARD OWN=0X429A03 EXT=22 SLTK=0X1C429A0B MP=0X684003F0 SUB=0X24400007
RT ID=0X18429A19 TY=CALONSC OWN=0X14429A0B EXLP=0X14429A0C CMDS=(N:0X8429A04,C:0X14429A0B) SGCC=2 REL=1 DESC="AURANT YD-INDSTRY LD" ATIS=T
RT ID=0X18429A1A TY=CALONSC OWN=0X14429A0B EXLP=0X14429A08 CMDS=(R:0X8429A04,N:0X8429A05,C:0X14429A0B) SGCC=2 REL=2 DESC="AURANT YD TO TRK.1" ATIS=T
RT ID=0X18429A1B TY=CALONSC OWN=0X14429A0B EXLP=0X14429A0A CMDS=(R:0X8429A04,R:0X8429A05,C:0X14429A0B) SGCC=2 REL=3 DESC="AURANT YD TO TRK.2" ATIS=T
SL ID=0X14429A0C TY=STANDARD OWN=0X429A03 EXT=24 SLTK=0X1C429A0B MP=0X684003F1 SUB=0X24400007
RT ID=0X18429A1C TY=CALONSC OWN=0X14429A0C EXLP=0X14429A0B CMDS=(N:0X8429A04,C:0X14429A0C) SGCC=2 REL=1 DESC="AURANT YD-INDSTRY LD" ATIS=T
TK ID=0X1C429A08 TY=BLKTK OWN=0X429A03 EXT=12 LRMP=0X6C40BDAF LEN=5837 FSPD=60 PSPD=65 QUAL=TRK.1 MAXGE=0 MAXGW=0 JAL=4 ALT=12 SUB=0X24400007 RULES=(CTC:B:UP:0X24400007:485.7305:486.8359:T) LLON=-118.1766772 RLON=-118.1620059 LLAT=34.06838375 RLAT=34.07811764 LELE=416.6983 RELE=425.0596 ULAD=NO URAD=NO
PT ID=0X20429A0F TY=STANDARD OWN=0X1C429A08 LTK=0X1C40006C RTK=0X1C429A0C REL=1 LEN=1 LQUAL="TRK.1" RQUAL="TRK.1"
PTK OWN=0X1C429A08 PID=0X1C429A13 GOOD
PTK OWN=0X1C429A(08 PID=0X1C429A13 BAD
PTK OWN=0X1C429A08 )PID=0X1C429A13 BAD
PTK OWN=0X1C(42(9A))08 PID=0X1C429A13 GOOD
PTK OWN=0X1C(42(9A))08 PID=0X1C42(9A13 BAD
PTK OWN=0X1C(42(9A))08 PID=0X1C42"("9A13 GOOD
"""
from pyparsing import nestedExpr, quotedString
paired_exprs = nestedExpr('(',')') | quotedString
for i, line in enumerate(text.splitlines(), start=1):
# use pyparsing expression to strip out properly nested quotes/parentheses
stripped_line = paired_exprs.suppress().transformString(line)
# if there are any quotes or parentheses left, they were not
# properly nested
if any(unwanted in stripped_line for unwanted in '()"\''):
print i, ':', line
打印:
10 : PTK OWN=0X1C429A(08 PID=0X1C429A13 BAD
11 : PTK OWN=0X1C429A08 )PID=0X1C429A13 BAD
13 : PTK OWN=0X1C(42(9A))08 PID=0X1C42(9A13 BAD
答案 2 :(得分:3)
检查一行的示例代码如下 - 我在第一行插入了一个迷路括号。
d = """SL ID=0X14429A0B TY=STANDARD OWN=0X429A(03 EXT=22 SLTK=0X1C429A0B MP=0X684003F0 SUB=0X24400007
RT ID=0X18429A19 TY=CALONSC OWN=0X14429A0B EXLP=0X14429A0C CMDS=(N:0X8429A04,C:0X14429A0B) SGCC=2 REL=1 DESC="AURANT YD-INDSTRY LD" ATIS=T
RT ID=0X18429A1A TY=CALONSC OWN=0X14429A0B EXLP=0X14429A08 CMDS=(R:0X8429A04,N:0X8429A05,C:0X14429A0B) SGCC=2 REL=2 DESC="AURANT YD TO TRK.1" ATIS=T
RT ID=0X18429A1B TY=CALONSC OWN=0X14429A0B EXLP=0X14429A0A CMDS=(R:0X8429A04,R:0X8429A05,C:0X14429A0B) SGCC=2 REL=3 DESC="AURANT YD TO TRK.2" ATIS=T
SL ID=0X14429A0C TY=STANDARD OWN=0X429A03 EXT=24 SLTK=0X1C429A0B MP=0X684003F1 SUB=0X24400007
RT ID=0X18429A1C TY=CALONSC OWN=0X14429A0C EXLP=0X14429A0B CMDS=(N:0X8429A04,C:0X14429A0C) SGCC=2 REL=1 DESC="AURANT YD-INDSTRY LD" ATIS=T
TK ID=0X1C429A08 TY=BLKTK OWN=0X429A03 EXT=12 LRMP=0X6C40BDAF LEN=5837 FSPD=60 PSPD=65 QUAL=TRK.1 MAXGE=0 MAXGW=0 JAL=4 ALT=12 SUB=0X24400007 RULES=(CTC:B:UP:0X24400007:485.7305:486.8359:T) LLON=-118.1766772 RLON=-118.1620059 LLAT=34.06838375 RLAT=34.07811764 LELE=416.6983 RELE=425.0596 ULAD=NO URAD=NO
PT ID=0X20429A0F TY=STANDARD OWN=0X1C429A08 LTK=0X1C40006C RTK=0X1C429A0C REL=1 LEN=1 LQUAL="TRK.1" RQUAL="TRK.1"
PTK OWN=0X1C429A08 PID=0X1C429A13"""
def unbalanced(line):
close_symbols = {'"' : '"', '(': ")", '[': ']', "'" : "'"}
syms = [x for x in line if x in '\'"[]()']
stack = []
for s in syms:
try:
if len(stack) > 0 and s == close_symbols[stack[-1]]:
stack.pop()
else:
stack.append(s)
except: # catches stack underflow or closing symbol lookup
return True
return len(stack) != 0
print unbalanced("hello 'there' () []")
print unbalanced("hello 'there\"' () []")
print unbalanced("][")
lines = d.splitlines() # in your case you can do open("file.txt").readlines()
print [line for line in lines if unbalanced(line)]
对于大文件,您不希望将所有文件读入内存,因此请使用这样的片段:
with open("file.txt") as infile:
for line in infile:
if unbalanced(line):
print line
答案 3 :(得分:1)
正则表达式 - 如果你的行不包含嵌套括号,那么解决方案非常简单:
for line in myFile:
if re.search(r"\([^\(\)]*($|\()", line):
#this line contains unbalanced parentheses.
如果你正在处理嵌套语句的可能性,它会变得更复杂一些:
for line in myFile:
paren_stack = []
for char in line:
if char == '(':
paren_stack.append(char)
elif char == ')':
if paren_stack:
paren_stack.pop()
else:
#this line contains unbalanced parentheses.
答案 4 :(得分:0)
我会做类似的事情:
for line in open(file, r):
if line.count('"') % 2 != 0 or line.count('(') != line.count(')'):
print(line)
但我无法确定这完全符合您的需求。
更强大:
for line in open(file, r):
paren_count = 0
paren_count_start_quote = 0
quote_open = False
for char in line:
if char == ')':
paren_count -= 1
elif char == '(':
paren_count += 1
elif char == '"':
quote_open = not quote_open
if quote_open:
paren_count_start_quote = paren_count
elif paren_count != paren_count_start_quote:
print(line)
break
if paren_count < 0:
break
if quote_open or paren_count != 0:
print(line)
没有测试强大的,我认为应该工作。它现在可以确保以下内容:(&#34;)&#34 ;,在报价内部关闭一组parens打印该行。
答案 5 :(得分:0)
检查此代码
from tokenize import *
def syntaxCheck(line):
def readline():
yield line
yield ''
par,quo,dquo = 0,0,0
count = { '(': (1,0,0),')': (-1,0,0),"'": (0,1,0),'"':(0,0,1) }
for countPar, countQuo,countDQuo in (
count.get(token,(0,0))+(token,) for _,token,_,_,_ in tokenize(readline().__next__)):
par += countPar
quo ^= countQuo
dquo ^= countDQuo
return par,quo,dquo
请注意,封闭引号内的括号不计数,因为它计为单字符串标记。
答案 6 :(得分:-1)
每条线路上的报价和报价是否应该关闭?如果是这种情况,您可以简单地计算括号和引号。如果它是偶数,它们是匹配的。如果它是奇怪的,那么就会丢失一个。将该逻辑放入函数中,将文本文件的行转储到数组中,并调用map为数组中的每个字符串执行函数。
我的python生锈了,但这就是我假设所有“应该”在同一条线上的方式。
答案 7 :(得分:-1)
我的解决方案可能不那么华丽,但我说你只计算括号和引号的数量。如果它没有出现偶数,你知道你错过了什么!