所以我有一个文件,其中包含我正在编写的程序的组合实例(数字列表)。 然后我继续把所有的行放在' @'培训和测试文件。现在我想将28,709个实例放到我的培训文件中,然后将其余的文件实例放到测试文件中。
当我这样做时,使用以下代码:
import itertools
# Splits the training and testing instances
# with the newly reduced attributes
training = open('training.txt', 'w')
testing = open('testing.txt', 'w')
linecount = 0
with open('combined.txt', 'r') as f:
for l in f:
if not l.startswith('@'):
break
else:
training.write(l)
testing.write(l)
linecount += 1
with open('combined.txt', 'r') as f:
newcount = 0
for l in f:
while(newcount < linecount):
f.next()
newcount += 1
if linecount > (linecount + 28709):
testing.write(l)
else:
training.write(l)
linecount += 1
'''# Write 28,709 instances to training set
for l in itertools.islice(f, linecount, linecount + 28709):
training.write(l)
# Write rest of instances to testing set
for i in xrange(linecount + 28710):
f.next()
for l in f:
testing.write(l)'''
..它没有对训练集进行所有实例,也没有输出任何测试集。可以在此处找到原始组合文件(太大而无法粘贴):https://gist.githubusercontent.com/ryankshah/618fde939a54c5eb8642135ab1f4514c/raw/a5a11c0fc301a6724b9af4c413d76b96ffa9859c/combined.txt
编辑:所有@符号行都应该在两者中。然后在最后一个&#39; @&#39;之后的前28709行。应该在培训文件中,其余的在测试文件中
谢谢!
答案 0 :(得分:1)
这应该可以满足您的需求。我在代码中添加了注释来解释我改变了什么。
# Splits the training and testing instances
# with the newly reduced attributes
training = open('training.txt', 'w')
testing = open('testing.txt', 'w')
linecount = 0
with open('combined.txt', 'r') as f:
for l in f:
if not l.startswith('@'):
break
else:
training.write(l)
testing.write(l)
# increment every time to get position of last '@' symbol
# can't skip lines in between '@'' symbols
linecount += 1
val = 28709
with open('combined.txt', 'r') as f:
# skip first n lines up to last '@' symbol
for _ in range(linecount):
f.next()
# write first 28709 lines after last '@' symbol to training file
new_linecount = 0
for l in f:
if new_linecount >= val:
testing.write(l)
else:
training.write(l)
new_linecount += 1
'''# Write 28,709 instances to training set
for l in itertools.islice(f, linecount, linecount + 28709):
training.write(l)
# Write rest of instances to testing set
for i in xrange(linecount + 28710):
f.next()
for l in f:
testing.write(l)'''