Question

我是编程领域的初学者，想要一些如何解决挑战的技巧。现在我有~10 000个.dat文件，每个文件都有一行跟随这个结构：

ATTRIBUTE1 =值及安培; Attribute2 =值及安培; Attribute3 =值... AttibuteN =值

我一直在尝试使用python和CSV库将这些.dat文件转换为单个.csv文件。

到目前为止，我能够写出能够读取所有文件的内容，将每个文件的内容存储在一个新行中并替换＆＃34;＆amp;＆＃34;到＆＃34;，＆＃34;但由于Attribute1，Attribute2 ... AttributeN对于每个文件都完全相同，我想将它们放入列标题中并从每个其他行中删除它们。

有关如何解决的任何提示？

谢谢！

Answer 1

由于您是初学者，我准备了一些有效的代码，同时也非常容易理解。

我假设您拥有名为＆＃39; input＆＃39;的文件夹中的所有文件。下面的代码应该在文件夹旁边的脚本文件中。

请记住，此代码应用于了解如何解决此类问题。故意排除了优化和健全性检查。

你可能想要另外检查当某个行中缺少值时会发生什么，当某个属性丢失时会发生什么，输入损坏会发生什么等等。:)

祝你好运！

import os

# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
    attributes = []
    values = []

    # first we split the input over the &
    AtributeValues = line.split('&')
    for attrVal in AtributeValues:
        # we split the attribute=value over the '=' sign
        # the left part goes to split[0], the value goes to split[1]
        split = attrVal.split('=')
        attributes.append(split[0])
        values.append(split[1])

    # return the attributes list and values list
    return attributes,values

# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)

# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
    f_in = open(inFile, 'r')    # only reading the file
    f_out = open(wfile, 'ab+')  # file is opened for reading and appending

    # read the whole file line by line
    lines = f_in.readlines()

    # loop throug evert line in the file and write its values
    for line in lines:
        # let's check if the file is empty and write the headers then
        first_char = f_out.read(1)
        header, values = getAttributesAndValues(line)

        # we write the header only if the file is empty
        if not first_char:
            for attribute in header:
                f_out.write(attribute+delim)
            f_out.write("\n")

        # we write the values
        for value in values:
            f_out.write(value+delim)
        f_out.write("\n")

# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]

# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
    writeToCsv('input/'+singleFile)

Answer 2

将dat文件放在名为self.calcArea(self.elementSet[:,0], self.elementSet[:,1], self.elementSet[:,2])的文件夹中。将此脚本放在myDats文件夹旁边，并附带一个名为myDats的文件。您还需要temp.txt。 [也就是说，您将在同一文件夹中output.csv，output.csv和myDats

<强> mergeDats.py

mergeDats.py

Answer 3

但由于Attribute1，Attribute2 ... AttributeN完全相同对于每个文件，我想将它们变成列标题和从其他所有行中删除它们。

input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'

一次为第一个文件：

','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))

每个文件的内容：

','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))

也许你需要另外修剪弦乐;不知道你的输入是多么干净。

使用Python将单行.dat文件合并为一个.csv文件

3 个答案: