Question

#returns same result i.e. only the first line as many times as 'draws'
infile = open("results_from_url.txt",'r')

file =infile.read()                                       # essential to get correct formatting
for line in islice(file, 0, draws):                       # allows you to limit number of draws
    for line in re.split(r"Wins",file)[1].split('\n'):
        mains.append(line[23:38])                         # slices first five numbers from line
        stars.append(line[39:44])                         # slices last two numbers from line

infile.close()

我正在尝试使用上面的代码来遍历数字列表以提取感兴趣的位。在尝试学习如何在Python 3中使用正则表达式时，我正在使用从互联网上打开的彩票结果。所有这一切都是为了读取一行并在我指示“绘制”的值时多次返回它。有人可以告诉我我做错了吗。是否会终止＆＃39;不知何故？奇怪的是，如果我将文件复制到字符串并运行此例程，它就可以工作。我很茫然 - 问题＆＃39;阅读＆＃39;一个文件或我使用正则表达式？

Answer 1

我无法告诉您为什么您的代码无法正常工作，因为我无法重现您获得的结果。我也不确定

的目的是什么

for line in islice(file, 0, draws):

是因为之后你从不使用line变量，所以你立即用

覆盖它

for line in re.split(r"Wins",file)[1].split('\n'):

另外，您可以使用file.split('Wins')代替re.split(r"Wins",file)，因此您根本不使用正则表达式。

Regex是一种查找特定格式数据的工具。当你可以使用它来查找你正在寻找的数据时，为什么要用它来分割输入文本？

你在寻找什么？由逗号分隔的七个数字序列。翻译成正则表达式：

(?:\d+,){7}

但是，我们希望将前5个数字分组 - ＆＃34;主电源＆＃34; - 以及最后两个数字 - ＆＃34; stars＆＃34;。因此，我们将添加两个命名的捕获组，名为＆＃34; mains＆＃34;和＆＃34;明星＆＃34;：

(?P<mains>(?:\d+,){5})(?P<stars>(?:\d+,){2})

此模式将找到您正在寻找的所有数字。

import re

data= open("infile.txt",'r').read()

mains= []
stars= []

pattern= r'(?P<mains>(?:\d+,){5})(?P<stars>(?:\d+,){2})'
iterator= re.finditer(pattern, data)
for count in range(int(input('Enter number of draws to examine: '))):
    try:
        match= next(iterator)
    except StopIteration:
        print('no more matches')
        break

    mains.append(match.group('mains'))
    stars.append(match.group('stars'))

print(mains,stars)

这将打印类似['01,03,31,42,46,'] ['04,11,']的内容。您可能希望删除逗号并将数字转换为整数，但实质上，这就是您使用正则表达式的方法。

将正则表达式应用于文本文件Python 3

1 个答案: