我想要一个快速的pythonic方法来给我一个循环计数。实际上,我很尴尬地发布我目前无效的解决方案。
给定来自文本文件的样本如下:
script7
BLANK INTERRUPTION
script2
launch4.VBS
script3
script8
launch3.VBS
script5
launch1.VBS
script6
我想要一直计算脚本[y]后面是一个启动[X]。 Launch的值范围为1-5,而脚本的范围为1-15。
以script3为例,我需要对给定文件中的以下每一项进行计数:
script3
launch1
#count this
script3
launch2
#count this
script3
launch3
#count this
script3
launch4
#count this
script3
launch4
#count this
script3
launch5
#count this
我认为这里涉及的循环数量超过了我对Python的了解。非常感谢任何帮助。
答案 0 :(得分:1)
这是一种使用嵌套字典的方法。如果您希望输出格式不同,请告诉我:
#!/usr/bin/env python3
import re
script_dict={}
with open('infile.txt','r') as infile:
scriptre = re.compile(r"^script\d+$")
for line in infile:
line = line.rstrip()
if scriptre.match(line) is not None:
script_dict[line] = {}
infile.seek(0) # go to beginning
launchre = re.compile(r"^launch\d+\.[vV][bB][sS]$")
current=None
for line in infile:
line = line.rstrip()
if line in script_dict:
current=line
elif launchre.match(line) is not None and current is not None:
if line not in script_dict[current]:
script_dict[current][line] = 1
else:
script_dict[current][line] += 1
print(script_dict)
答案 1 :(得分:1)
这是我使用带有计数器和regex with lookahead的defaultdict的解决方案。
import re
from collections import Counter, defaultdict
with open('in.txt', 'r') as f:
# make sure we have only \n as lineend and no leading or trailing whitespaces
# this makes the regex less complex
alltext = '\n'.join(line.strip() for line in f)
# find keyword script\d+ and capture it, then lazy expand and capture everything
# with lookahead so that we stop as soon as and only if next word is 'script' or
# end of the string
scriptPattern = re.compile(r'(script\d+)(.*?)(?=script|\n?$)', re.DOTALL)
# just find everything that matches launch\d+
launchPattern = re.compile(r'launch\d+')
# create a defaultdict with a counter for every entry
scriptDict = defaultdict(Counter)
# go through all matches
for match in scriptPattern.finditer(alltext):
script, body = match.groups()
# update the counter of this script
scriptDict[script].update(launchPattern.findall(body))
# print the results
for script in sorted(scriptDict):
counter = scriptDict[script]
if len(counter):
print('{} launches:'.format(script))
for launch in sorted(counter):
count = counter[launch]
print('\t{} {} time(s)'.format(launch, count))
else:
print('{} launches nothing'.format(script))
使用regex101上的字符串(参见上面的链接)我得到以下结果:
script2 launches:
launch4 1 time(s)
script3 launches nothing
script5 launches:
launch1 1 time(s)
script6 launches nothing
script7 launches nothing
script8 launches:
launch3 1 time(s)
答案 2 :(得分:1)
为什么不使用多行正则表达式 - 然后脚本变为:
import re
# read all the text of the file, and clean it up
with open('counts.txt', 'rt') as f:
alltext = '\n'.join(line.strip() for line in f)
# find all occurrences of the script line followed by the launch line
cont = re.findall('^script(\d)\nlaunch(\d+)\.VBS\n(?mi)',alltext)
# accumulate the counts of each launch number for each script number
# into nested dictionaries
scriptcounts = {}
for scriptnum,launchnum in cont:
# if we haven't seen this scriptnumber before, create the dictionary for it
if scriptnum not in scriptcounts:
scriptcounts[scriptnum]={}
# if we haven't seen this launchnumber with this scriptnumber before,
# initialize count to 0
if launchnum not in scriptcounts[scriptnum]:
scriptcounts[scriptnum][launchnum] = 0
# incremement the count for this combination of script and launch number
scriptcounts[scriptnum][launchnum] += 1
# produce the output in order of increasing scriptnum/launchnum
for scriptnum in sorted(scriptcounts.keys()):
for launchnum in sorted(scriptcounts[scriptnum].keys()):
print "script%s\nlaunch%s.VBS\n# count %d\n"%(scriptnum,launchnum,scriptcounts[scriptnum][launchnum])
输出(以您请求的格式)是,例如:
script2
launch1.VBS
# count 1
script2
launch4.VBS
# count 1
script5
launch1.VBS
# count 1
script8
launch3.VBS
# count 3
re.findall()返回所有匹配项的列表 - 每个匹配项是模式的()部分列表,但(?mi)除外,它是指示正则表达式匹配器跨行结束的指令\ n并且匹配不区分大小写。正如图所示的正则表达式模式,例如片段'脚本(\ d)'将脚本/启动后的数字拉出到比赛中 - 这可以很容易地包括脚本'通过'(脚本\ d)',类似地'(启动\ d + \ .VBS)'只有印刷才需要修改来处理这种变化。
HTH 巴尼
答案 3 :(得分:0)
您可以使用setdefault
方法
<强>码强>
dic={}
with open("a.txt") as inp:
check=0
key_string=""
for line in inp:
if check:
if line.strip().startswith("launch") and int(line.strip()[6])<6:
print "yes"
dic[key_string]=dic.setdefault(key_string,0)+1
check=0
if line.strip().startswith("script"):
key_string=line.strip()
check=1
对于您的给定输入,输出将是
<强>输出:强>
{"script3":6}