Question

我正在寻找类似于awk脚本的Python，根据记录中的标志将文件拆分为26个部分。这是因为在一个文件中有26种不同的记录类型，这是Burroughs在20世纪70年代使用的分层数据库中的宿醉。我希望能够打开26个名为f_A的文件到f_Z而不是传统的f1，然后在我读取它们的时候流出记录而不将整个数据保存在缓冲区中。

# Gawk original - split new valuation roll format into record types A-Z
# run gawk -F\| -f split.awk input_file
# creates A.raw, B.raw, .... Z.raw
# Oct 1995 
{ident = $8; 
file = ident".raw";
print $0 >> file}

所以我认为我可以组成一个文件句柄，然后用eval（）或其他东西调用它来将每个记录定向到正确的输出。

for line in fileinput.input(src):
    parts = line.split('|')
    recType = parts[7]
    recFile = 'f_'+recType
    if not recType in openFiles:
        eval(recFile) = open(recType+".raw",'w') # how should this line be written?
    eval(recFile).write(line)
    # ....

我可以从f1.name获取系统文件的名称并评估变量以获取句柄，例如eval（“f_A”）但我看不到如何使用非硬编码的句柄打开文件。

Answer 1

eval是值得避免的，幸运的是，它几乎不需要。在这种情况下，open(recType+".raw",'w')会创建一个文件句柄。您只需将其与recType相关联即可。这就是词典的用途。

在下面的代码中，openFiles是一本字典。每当我们遇到新的recType时，我们都会为其打开一个文件，并将其文件句柄保存在openFiles下的密钥recType下。每当我们想再次写入该文件时，我们只需要在字典中查询文件句柄。因此：

openFiles = {}
for line in fileinput.input(src):
    parts = line.split('|')
    recType = parts[7]
    if not recType in openFiles:
        openFiles[recType] = open('f_' + recType, 'w')
    openFiles[recType].write(line)
    # ....

使用Python将文件拆分为基于数据的部分

1 个答案: