Question

我是编程新手，并且在计算嵌套循环方面遇到了一些问题。我有一个数据列表，我想从一个更大的文件中提取。我能够成功地从较大的文件中提取一个数据项，但我需要从这个数千个试验的较大文件中提取100个不同的试验。每个试验都是较大文件的一行数据。这是我用来一次成功提取一行数据的程序。在这个例子中，它提取了试验1的数据。它基于我在之前的问题和教程中看到的例子。问题是我不需要试验1-100，或任何有序的模式。我需要试验134,274,388等。它会跳过。因此，如果没有我可以输入的范围，我不知道如何使用for语句进行嵌套循环。任何帮助表示赞赏。谢谢。

completedataset = open('completedataset.txt', 'r')

smallerdataset = open('smallerdataset.txt', 'w')


for line in completedataset:
    if 'trial1' in line: smallerdataset(line)


completedataset.close()
smallerdataset.close()

我真的很想这样做：

试验=（'试验12'，'试验23'，'试验34'）

for completeataset中的行：审判中的审判：如果在线试验：smalldataset（line）

但这不起作用。任何人都可以帮我修改这个程序，以便它正常工作吗？

Answer 1

在我看来，你需要一个包含你感兴趣的所有试用号的列表。所以也许你可以尝试这样的事情：

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

trials = [134, 274, 388]
completedata = completedataset.readlines()

for t in trials:
    for line in completedata:
        if "trial"+str(t) in line:
            smallerdataset.write(line)
completedataset.close()
smallerdataset.close()

Answer 2

你可以这样做：

trials = ['trial1', 'trial134', 'trial274']

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)

为了更有效地操作，您可以使用'trial [0-9] +'-regex匹配每一行，并查看是否可以从一个集合中找到该符号。

Answer 3

如果完整集中的每个试验都是已知字节大小，则可以使用file.seek(n)，其中n是开始读取的字节。例如，如果文件中的每一行长度为3个字节，则可以执行以下操作：

myfile = open('file.txt', 'r')
myfile.seek(lineToStartAt * 3)

myfile.readline()#etc

如果每行的字节数是可变的或未知的，你只需要读入行并丢弃你不关心的行（如在KLee1's answer中）

Answer 4

假设您提前知道了试验，您可以

trials = ('trial12', 'trial23', 'trial34')

for line in completedataset:
    for trial in trials:
        if trial in line: smallerdataset(line)

Answer 5

您将以指定试验的方式遇到一些问题。如果你查找包含'trial1'的行，你也会得到包含'trial123'的行。如果您以某种方式构建较大的数据集，则可以尝试在特定字段中查找试用编号。例如，如果数据以逗号分隔，则可以使用csv包。最后，使用生成器表达式而不是循环将使事情变得更加清晰。假设试验编号位于数据集的第一列，您可以执行以下操作：

import csv

trials = ['trial134', 'trial1', 'trial56']
data = csv.reader(open('completedataset.txt'))

with open('smalldataset.txt','w') as outf:
    csv.writer(outf).writerows(l for l in data if l[0] in trials)

Answer 6

假设你有一个函数，看一行，能够告诉你该行是否“需要”，你的代码的正确结构将非常简单：

with open('completedataset.txt', 'r') as completedataset:
    with open('smallerdataset.txt', 'w') as smallerdataset:
        for line in completedataset:
            if iwantthisone(line):
                smallerdataset.write(line)

with语句会为您完成结算。在Python 2.7中，您可以将两个with合并为一个;在Python 2.5中，您需要使用from __future__ import with_statement启动模块;在Python 2.6中，目前是最普遍的版本，上面的代码是正确的形式。

所以，绝对一切都归结为iwantthisone函数。您没有告诉我们有关您的线路格式的任何信息，因此我们无法为您提供更多帮助。但是假设例如每行中的第一个单词标识测试，例如test432 ...，您在名为want_these的集合中拥有所需的测试数量，例如set([113, 432, 251, ...])。然后，编写iwantthisone的一种非常简单的方法可能是：

def iwantthisone(line):
    firstword = line.split(None, 1)[0]
    testnumber = int(firstword[4:])
    return testnumber in want_these

iwantthisone的正确内容完全取决于您的线条格式，当然，您如何确定实际做想要保留的线条。但我希望这种一般结构仍有帮助。

请注意，我推荐的这种通用结构中确实没有嵌套循环！ - ）

Answer 7

关于您在注释中显示的错误消息：行继续符是反斜杠，因此它告诉您在该行的某处有一个错误的反斜杠字符。

Answer 8

假设行始终以试用标识符开头，您可以使用startswith函数和过滤器来提取所需的行。

completedataset = open('completedataset.txt', 'r')
smallerdataset = open('smallerdataset.txt', 'w')

wantedtrials = ('trial134', 'trial274', 'trial388')

for line in completedataset:
    if filter(line.startswith, wantedtrials):
        smallerdataset.write(line)

关于嵌套循环的问题

8 个答案: