我需要从包含表达式频率分布的文本文件列表中形成一个矩阵。因此,我从目录创建了所有文本文件(lof)的列表,并用它来构建矩阵(感谢gboffy)。该列表中的每个文件名的结构都是这样的:CompanyName-SerialNumber_IssueDate_IFRS.txt(例如:GoldmanSachs-123456_31.12.2014_IFRS.txt)。每个文件的内容结构也完全相同:
CompanyABC-123456_31.12.2012_IFRS.txt
Company ABC-123456_31.12.2012
financial statement:4
corporate-taxes:8
assets:2
available-for-sale property:0
auditors:213
Company123-789102_31.12.2012_IFRS.txt
Company123-789102_31.12.2012
financial statement:15
corporate-taxes:3
assets:8
available-for-sale property:2
auditors:23
我想要的输出应该是写入txt的单个矩阵文件,每个公司文件有一行,包括(CompanyName,Serial Number,IssueDate,Frequency1,Frequency2,...,FrequencyN):
'CompanyABC','123456','31.12.2012','4','8','2','0','213' \n
'Company123','789102','31.12.2012','15','3','8','2','23' \n
到目前为止,这是我的代码:
def list_textfiles(directory, min_file_size):
# Creates a list of all files stored in DIRECTORY ending on '.txt' with minimum file size
textfiles = []
for root, dirs, files in os.walk(directory):
for name in files:
filename = os.path.join(root, name)
if os.stat(filename).st_size > min_file_size:
textfiles.append(filename)
return textfiles
directory = 'C:/CompanyFiles'
minimum_size = 30000
lof = list_textfiles(directory, minimum_size)
res = []
for f in lof:
res += [[entry.split(':')[1] for entry in cdata ]
for cdata in [data.splitlines() for data in open(f).read().split('\n\n')]]
with open('C:/CompanyFiles/Matrix.txt', 'wt') as outfile:
outfile.write(str(res))
如何修改我的代码以实现上述输出?
答案 0 :(得分:0)
这应该可以解决问题:
import os
outFile = 'C:/CompanyFiles/Matrix.txt'
folder = 'C:/CompanyFiles'
with open(outFile, 'w') as wfp:
for f in os.listdir(inFolder):
tmp = [line.rstrip() for line in open(os.path.join(folder, f), 'r')]
arr = tmp[0].split('-')
arr = [arr[0]] + arr[1].split('_')
arr += [t.split(':')[1].strip() for t in tmp[1:]]
wfp.write(','.join(["'" + e + "'" for e in arr]) + '\n')
注意:我没有彻底测试
答案 1 :(得分:0)
在list of files
#your code
lof = list_textfiles(directory, minimum_size)
for i in lof:
with open(i) as f:
for j in f:
out_list = []
split_to_out = j.split("-")
out_list.append(split_to_out[0])
out_list.append(split_to_out[1].split("_")[0])
out_list.append(split_to_out[1].split("_")[1])
temp = next(f, None)
while temp:
out_list.append(temp.split(":")[-1])
temp = next(f, None)
out_list = [i.strip() for i in out_list]
to_write = ",".join(out_list) + "\n"
with open('/home/quadloops/Matrix.txt', 'a') as outfile:
outfile.write(str(to_write))
>>>cat Matrix.txt
Company ABC,123456,31.12.2012,4,8,2,0,213
Company123,789102,31.12.2012,15,3,8,2,23
更改为to_write = ",".join(out_list) + "\n"
给出
>>>cat Matrix.txt
'Company ABC','123456','31.12.2012','4','8','2','0','213'
'Company123','789102','31.12.2012','15','3','8','2','23'