Question

我正在编写一个程序，该程序读取带有乐透号码的大型.txt文件。这意味着一个数组中总有7个int。（从49开始，最后一个是超级数字）。

例如：[[1,11,25,37,39,47,0]，[3,13,15,18,37,46,0]，......]

我每个月都有这个.txt，这意味着它就像

January:
[1, 11, 25, 37, 39, 47, 0]
[3, 13, 15, 18, 37, 46, 2]
[3,  6,  9, 12, 37, 46, 6]

February:
[3, 13, 15, 18, 37, 46, 0]
[1, 23, 17, 18, 37, 46, 8]

...

等等

如何生成一个只读取月份数的数组？

我有一个解决方案，但编码风格非常糟糕：

jan_tipps = []
feb_tipps = []
mar_tipps = []

#variable wich month has to be checked
jan = False
feb = False
mar = False

for line in wholefile:
    if line == '\n':
        pass
    elif line == 'January:\n':
        jan = True
    elif line == 'February:\n':
        jan = False
        feb = True
    elif line == 'March:\n':
        feb = False
        mar = True
    elif jan == True:
        jan_tipps.append(line.split())

    elif feb == True:
        feb_tipps.append(line.split())

    elif mar == True:
        mar_tipps.append(line.split())

我想我需要一些像泛型或自生成变量的东西。我不知道自己要在互联网上搜索什么。

Answer 1

您可以使用正则表达式：

import re

lotto = """
January:
[1, 11, 25, 37, 39, 47, 0]
[3, 13, 15, 18, 37, 46, 2]
[3,  6,  9, 12, 37, 46, 6]

February:
[3, 13, 15, 18, 37, 46, 0]
[1, 23, 17, 18, 37, 46, 8]
"""

def getMonthlyNumbers(month=None):
    rx = re.compile(r'''
        ^{}:[\n\r]
        (?P<numbers>(?:^\[.+\][\n\r]?)+)'''.format(month), re.M | re.X)

    for match in rx.finditer(lotto):
        # print it or do sth. else here
        print(match.group('numbers'))

getMonthlyNumbers('January')
getMonthlyNumbers('February')

<小时/> 或者，所有月份，使用字典理解：

rx = re.compile(r'^(?P<month>\w+):[\n\r](?P<numbers>(?:^\[.+\][\n\r]?)+)', re.MULTILINE)

result = {m.group('month'): m.group('numbers') for m in rx.finditer(lotto)}

print(result)

哪个收益

{'January': '[1, 11, 25, 37, 39, 47, 0]\n[3, 13, 15, 18, 37, 46, 2]\n[3,  6,  9, 12, 37, 46, 6]\n', 'February': '[3, 13, 15, 18, 37, 46, 0]\n[1, 23, 17, 18, 37, 46, 8]\n'}

这里的想法是在一行的开头查找月份名称，然后捕获任何[...]对。见a demo on regex101.com。

<小时/> 也许你希望将每一行作为一个列表（而不是一个字符串），因此你可以选择：

import re
from ast import literal_eval

lotto = """
January:
[1, 11, 25, 37, 39, 47, 0]
[3, 13, 15, 18, 37, 46, 2]
[3,  6,  9, 12, 37, 46, 6]

February:
[3, 13, 15, 18, 37, 46, 0]
[1, 23, 17, 18, 37, 46, 8]
"""

rx = re.compile(r'^(?P<month>\w+):[\n\r](?P<numbers>(?:^\[.+\][\n\r]?)+)', re.MULTILINE)

result = {m.group('month'): 
    [literal_eval(numbers) 
    for numbers in m.group('numbers').split("\n") if numbers] 
    for m in rx.finditer(lotto)}

print(result)

Answer 2

正如Klaus D.评论，你需要一本字典。但我怀疑这还不够暗示。这是一个更广泛的答案。

一个问题：您的代码与您提供的输入数据不一致。您的代码将空格上的数字拆分，但输入数据则使用方括号和逗号。此代码适用于您提供的输入。

$line = "Name: " . $name . "---" . "Question: " . $question;
file_put_contents('questions.txt', $line . PHP_EOL, FILE_APPEND);

输出是：

# Parser states:
# 0: waiting for a month name
# 1: expecting numbers in the format [1, 11, 25, 37, 39, 47, 0]

from collections import defaultdict

state = 0
tipps = defaultdict(list)
monthname = None

with open("numbers.txt","r") as f:
    for line in f:
        if state == 0:
            if line.strip().endswith(":"):
                monthname = line.split(":")[0]
                state = 1
            continue
        if state == 1:
            if line.startswith("["):
                line = line.strip().strip("[]")
                numbers = line.split(",")
                tipps[monthname].append([int(n) for n in numbers])
            elif not line.strip():
                state = 0
            else:
                print (f"Unexpected data, parser stuck: {line}")
                break

for k,v in tipps.items():
    print (f"{k}: {v}")

Answer 3

创建月份字典，将月份名称作为键，将数组数组作为值

month = {
    m: [] 
    for m in ['January', 'February']
}
with open('file.txt') as file:
    latest = None
    for line in file:
        line = line.strip()
        if line == '':  # striped empty line
            continue
        if line in month:
            latest = line
        else:
            month[latest].append(line.split())  # actually if line="[1, 2]" then better to use eval instaed of split(', ')

Answer 4

您可以使用regular expression提取月份名称，ast.literal_eval()来解析每个月的数字列表，使用defaultdict来存储它们，而无需检查是否有月份在向其附加列表之前存在：

from collections import defaultdict
import ast
import re

with open('file.txt') as file:
    months = defaultdict(list)
    month = None
    for line in file:
        line = line.strip()
        m = re.match('([A-Z][a-z]+):', line)
        if m is not None:
            month = m.group(1)
        elif line.startswith('['):
            months[month].append(ast.literal_eval(line))
    for month, numbers in months.iteritems():
        print '{}: {}'.format(month, numbers)

输出：

January: [[1, 11, 25, 37, 39, 47, 0], [3, 13, 15, 18, 37, 46, 2], [3, 6, 9, 12, 37, 46, 6]]
February: [[3, 13, 15, 18, 37, 46, 0], [1, 23, 17, 18, 37, 46, 8]]

Python自生成变量？

4 个答案: