我正在编写一个程序,该程序读取带有乐透号码的大型.txt文件。这意味着一个数组中总有7个int。 (从49开始,最后一个是超级数字)。
例如:[[1,11,25,37,39,47,0],[3,13,15,18,37,46,0],......]
我每个月都有这个.txt,这意味着它就像
January:
[1, 11, 25, 37, 39, 47, 0]
[3, 13, 15, 18, 37, 46, 2]
[3, 6, 9, 12, 37, 46, 6]
February:
[3, 13, 15, 18, 37, 46, 0]
[1, 23, 17, 18, 37, 46, 8]
...
等等
如何生成一个只读取月份数的数组?
我有一个解决方案,但编码风格非常糟糕:
jan_tipps = []
feb_tipps = []
mar_tipps = []
#variable wich month has to be checked
jan = False
feb = False
mar = False
for line in wholefile:
if line == '\n':
pass
elif line == 'January:\n':
jan = True
elif line == 'February:\n':
jan = False
feb = True
elif line == 'March:\n':
feb = False
mar = True
elif jan == True:
jan_tipps.append(line.split())
elif feb == True:
feb_tipps.append(line.split())
elif mar == True:
mar_tipps.append(line.split())
我想我需要一些像泛型或自生成变量的东西。我不知道自己要在互联网上搜索什么。
答案 0 :(得分:1)
您可以使用正则表达式:
import re
lotto = """
January:
[1, 11, 25, 37, 39, 47, 0]
[3, 13, 15, 18, 37, 46, 2]
[3, 6, 9, 12, 37, 46, 6]
February:
[3, 13, 15, 18, 37, 46, 0]
[1, 23, 17, 18, 37, 46, 8]
"""
def getMonthlyNumbers(month=None):
rx = re.compile(r'''
^{}:[\n\r]
(?P<numbers>(?:^\[.+\][\n\r]?)+)'''.format(month), re.M | re.X)
for match in rx.finditer(lotto):
# print it or do sth. else here
print(match.group('numbers'))
getMonthlyNumbers('January')
getMonthlyNumbers('February')
<小时/> 或者,所有月份,使用字典理解:
rx = re.compile(r'^(?P<month>\w+):[\n\r](?P<numbers>(?:^\[.+\][\n\r]?)+)', re.MULTILINE)
result = {m.group('month'): m.group('numbers') for m in rx.finditer(lotto)}
print(result)
哪个收益
{'January': '[1, 11, 25, 37, 39, 47, 0]\n[3, 13, 15, 18, 37, 46, 2]\n[3, 6, 9, 12, 37, 46, 6]\n', 'February': '[3, 13, 15, 18, 37, 46, 0]\n[1, 23, 17, 18, 37, 46, 8]\n'}
这里的想法是在一行的开头查找月份名称,然后捕获任何[...]
对。见a demo on regex101.com。
import re
from ast import literal_eval
lotto = """
January:
[1, 11, 25, 37, 39, 47, 0]
[3, 13, 15, 18, 37, 46, 2]
[3, 6, 9, 12, 37, 46, 6]
February:
[3, 13, 15, 18, 37, 46, 0]
[1, 23, 17, 18, 37, 46, 8]
"""
rx = re.compile(r'^(?P<month>\w+):[\n\r](?P<numbers>(?:^\[.+\][\n\r]?)+)', re.MULTILINE)
result = {m.group('month'):
[literal_eval(numbers)
for numbers in m.group('numbers').split("\n") if numbers]
for m in rx.finditer(lotto)}
print(result)
答案 1 :(得分:1)
正如Klaus D.评论,你需要一本字典。但我怀疑这还不够暗示。这是一个更广泛的答案。
一个问题:您的代码与您提供的输入数据不一致。您的代码将空格上的数字拆分,但输入数据则使用方括号和逗号。此代码适用于您提供的输入。
$line = "Name: " . $name . "---" . "Question: " . $question;
file_put_contents('questions.txt', $line . PHP_EOL, FILE_APPEND);
输出是:
# Parser states:
# 0: waiting for a month name
# 1: expecting numbers in the format [1, 11, 25, 37, 39, 47, 0]
from collections import defaultdict
state = 0
tipps = defaultdict(list)
monthname = None
with open("numbers.txt","r") as f:
for line in f:
if state == 0:
if line.strip().endswith(":"):
monthname = line.split(":")[0]
state = 1
continue
if state == 1:
if line.startswith("["):
line = line.strip().strip("[]")
numbers = line.split(",")
tipps[monthname].append([int(n) for n in numbers])
elif not line.strip():
state = 0
else:
print (f"Unexpected data, parser stuck: {line}")
break
for k,v in tipps.items():
print (f"{k}: {v}")
答案 2 :(得分:0)
创建月份字典,将月份名称作为键,将数组数组作为值
month = {
m: []
for m in ['January', 'February']
}
with open('file.txt') as file:
latest = None
for line in file:
line = line.strip()
if line == '': # striped empty line
continue
if line in month:
latest = line
else:
month[latest].append(line.split()) # actually if line="[1, 2]" then better to use eval instaed of split(', ')
答案 3 :(得分:0)
您可以使用regular expression提取月份名称,ast.literal_eval()
来解析每个月的数字列表,使用defaultdict
来存储它们,而无需检查是否有月份在向其附加列表之前存在:
from collections import defaultdict
import ast
import re
with open('file.txt') as file:
months = defaultdict(list)
month = None
for line in file:
line = line.strip()
m = re.match('([A-Z][a-z]+):', line)
if m is not None:
month = m.group(1)
elif line.startswith('['):
months[month].append(ast.literal_eval(line))
for month, numbers in months.iteritems():
print '{}: {}'.format(month, numbers)
输出:
January: [[1, 11, 25, 37, 39, 47, 0], [3, 13, 15, 18, 37, 46, 2], [3, 6, 9, 12, 37, 46, 6]]
February: [[3, 13, 15, 18, 37, 46, 0], [1, 23, 17, 18, 37, 46, 8]]