我正在使用存储在大型文本文件中的数据集。对于我正在进行的分析,我打开文件,提取数据集的一部分并比较提取的子集。我的代码就像这样:
from math import ceil
with open("seqs.txt","rb") as f:
f = f.readlines()
assert type(f) == list, "ERROR: file object not converted to list"
fives = int( ceil(0.05*len(f)) )
thirds = int( ceil(len(f)/3) )
## top/bottom 5% of dataset
low_5=f[0:fives]
top_5=f[-fives:]
## top/bottom 1/3 of dataset
low_33=f[0:thirds]
top_33=f[-thirds:]
## Write lists to file
# top-5
with open("high-5.out","w") as outfile1:
for i in top_5:
outfile1.write("%s" %i)
# low-5
with open("low-5.out","w") as outfile2:
for i in low_5:
outfile2.write("%s" %i)
# top-33
with open("high-33.out","w") as outfile3:
for i in top_33:
outfile3.write("%s" %i)
# low-33
with open("low-33.out","w") as outfile4:
for i in low_33:
outfile4.write("%s" %i)
我正在尝试找到一种更聪明的方法来自动化将列表写入文件的过程。在这种情况下,只有四个,但在将来的情况下,我最终可能会有多达15-25个列表,我会有一些功能来处理这个问题。我写了以下内容:
def write_to_file(*args):
for i in args:
with open(".out", "w") as outfile:
outfile.write("%s" %i)
但是当我调用函数时,结果文件只包含最终列表:
write_to_file(low_33,low_5,top_33,top_5)
我知道我必须为每个列表定义一个输出文件(我在上面的函数中没有这样做),我只是不确定如何实现它。有任何想法吗?
答案 0 :(得分:1)
通过为每个参数递增计数器,每个参数可以有一个输出文件。例如:
def write_to_file(*args):
for index, i in enumerate(args):
with open("{}.out".format(index+1), "w") as outfile:
outfile.write("%s" %i)
上面的示例将创建输出文件"1.out"
,"2.out"
,"3.out"
和"4.out"
。
或者,如果您有想要使用的特定名称(如原始代码中所示),您可以执行以下操作:
def write_to_file(args):
for name, data in args:
with open("{}.out".format(name), "w") as outfile:
outfile.write("%s" % data)
args = [('low-33', low_33), ('low-5', low_5), ('high-33', top_33), ('high-5', top_5)]
write_to_file(args)
将创建输出文件"low-33.out"
,"low-5.out"
,"high-33.out"
和"high-5.out"
。
答案 1 :(得分:1)
使您的变量名与您的文件名匹配,然后使用字典来保存它们,而不是将它们保存在全局命名空间中:
data = {'high_5': # data
,'low_5': # data
,'high_33': # data
,'low_33': # data}
for key in data:
with open('{}.out'.format(key), 'w') as output:
for i in data[key]:
output.write(i)
将您的数据保存在一个易于使用的地方,并假设您要对它们应用相同的操作,您可以继续使用相同的范例。
如下面的PM2Ring所述,建议使用下划线(就像在变量名中一样)而不是破折号(就像在文件名中那样),因为这样做可以将字典键作为关键字参数传递到写作功能:
write_to_file(**data)
这相当于:
write_to_file(low_5=f[:fives], high_5=f[-fives:],...) # and the rest of the data
由此您可以使用其他答案定义的功能之一。
答案 2 :(得分:1)
不要试图聪明。而是旨在让您的代码易读,易于理解。您可以将重复的代码分组到一个函数中,例如:
from math import ceil
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write('{}'.format(item))
with open('data.txt') as f:
numbers = list(f)
five_percent = int(len(numbers) * 0.05)
thirty_three_percent = int(ceil(len(numbers) / 3.0))
# Why not: thirty_three_percent = int(len(numbers) * 0.33)
save_to_file(numbers[:five_percent], 'low-5.out')
save_to_file(numbers[-five_percent:], 'high-5.out')
save_to_file(numbers[:thirty_three_percent], 'low-33.out')
save_to_file(numbers[-thirty_three_percent:], 'high-33.out')
如果要编写很多列表,那么使用循环是有意义的。我建议有两个功能:save_top_n_percent
和save_low_n_percent
来帮助完成这项工作。它们包含一些重复的代码,但通过将它们分成两个函数,它更清晰,更容易理解。
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write(item)
def save_top_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[-n_percent:], 'top-{}.out'.format(n))
def save_low_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[:n_percent], 'low-{}.out'.format(n))
with open('data.txt') as f:
numbers = list(f)
for n_percent in [5, 33]:
save_top_n_percent(n_percent, numbers)
save_low_n_percent(n_percent, numbers)
答案 3 :(得分:0)
在这一行上,您每次都会打开一个名为 .out 的文件并写入。
with open(".out", "w") as outfile:
您需要为".out"
中的每个i
设置args
唯一身份。您可以通过传入列表作为args来实现此目的,列表将包含文件名和数据。
def write_to_file(*args):
for i in args:
with open("%s.out" % i[0], "w") as outfile:
outfile.write("%s" % i[1])
传递像这样的论据......
write_to_file(["low_33",low_33],["low_5",low_5],["top_33",top_33],["top_5",top_5])
答案 4 :(得分:0)
您正在创建一个名为“.out”的文件,并且每次都会覆盖它。
def write_to_file(*args):
for i in args:
filename = i + ".out"
contents = globals()[i]
with open(".out", "w") as outfile:
outfile.write("%s" %contents)
write_to_file("low_33", "low_5", "top_33", "top_5")
https://stackoverflow.com/a/6504497/3583980(字符串中的变量名)
这将创建low_33.out,low_5.out,top_33.out,top_5.out,其内容将是存储在这些变量中的列表。