Question

所以我有一个文件，其中包含我正在编写的程序的组合实例（数字列表）。然后我继续把所有的行放在＆＃39; @＆＃39;培训和测试文件。现在我想将28,709个实例放到我的培训文件中，然后将其余的文件实例放到测试文件中。

当我这样做时，使用以下代码：

import itertools

# Splits the training and testing instances
# with the newly reduced attributes

training = open('training.txt', 'w')
testing = open('testing.txt', 'w')

linecount = 0

with open('combined.txt', 'r') as f:
    for l in f:
        if not l.startswith('@'):
            break
        else:
            training.write(l)
            testing.write(l)
            linecount += 1

with open('combined.txt', 'r') as f:
    newcount = 0
    for l in f:
        while(newcount < linecount):
            f.next()
            newcount += 1

        if linecount > (linecount + 28709):
            testing.write(l)
        else:
            training.write(l)
        linecount += 1
    '''# Write 28,709 instances to training set
    for l in itertools.islice(f, linecount, linecount + 28709):
        training.write(l)
    # Write rest of instances to testing set
    for i in xrange(linecount + 28710):
        f.next()
    for l in f:
        testing.write(l)'''

..它没有对训练集进行所有实例，也没有输出任何测试集。可以在此处找到原始组合文件（太大而无法粘贴）：https://gist.githubusercontent.com/ryankshah/618fde939a54c5eb8642135ab1f4514c/raw/a5a11c0fc301a6724b9af4c413d76b96ffa9859c/combined.txt

编辑：所有@符号行都应该在两者中。然后在最后一个＆＃39; @＆＃39;之后的前28709行。应该在培训文件中，其余的在测试文件中

谢谢！

Answer 1

这应该可以满足您的需求。我在代码中添加了注释来解释我改变了什么。

# Splits the training and testing instances
# with the newly reduced attributes

training = open('training.txt', 'w')
testing = open('testing.txt', 'w')

linecount = 0

with open('combined.txt', 'r') as f:
    for l in f:
        if not l.startswith('@'):
            break
        else:
            training.write(l)
            testing.write(l)
        # increment every time to get position of last '@' symbol
        # can't skip lines in between '@'' symbols
        linecount += 1

val = 28709

with open('combined.txt', 'r') as f:
    # skip first n lines up to last '@' symbol
    for _ in range(linecount):
        f.next()

    # write first 28709 lines after last '@' symbol to training file
    new_linecount = 0
    for l in f:
        if new_linecount >= val:
            testing.write(l)
        else:
            training.write(l)
        new_linecount += 1
    '''# Write 28,709 instances to training set
    for l in itertools.islice(f, linecount, linecount + 28709):
        training.write(l)
    # Write rest of instances to testing set
    for i in xrange(linecount + 28710):
        f.next()
    for l in f:
        testing.write(l)'''

将文件的部分写入另外两个文件

1 个答案: