将多个特定的文本文件转换为python中的CSV文件

时间:2019-01-04 12:54:46

标签: python

我有许多非常特定格式的文本文件,需要将其读入csv。我似乎无法弄清楚如何以我想要的csv格式获取所有数据。我可以获取工作表的文件名和标题,但是工作表中没有任何数据处于活动状态。文本文件是一个s:

"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"

我的代码:

import os
import csv

dirpath = 'C:\Usersnput\\'
output = 'C:\Users\gputew Microsoft Excel Worksheet.csv'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)

for filename in files:
    with open(dirpath + '/' + filename) as afile:
        csvout.writerow([filename, afile.read()])
        afile.close()

outfile.close()

需要具有标题的电子表格;市场,mileStoneUpdates,woName,buildstage,designType,woOverwrite 每个文本文件中填充了Billing ... ect的单元格。

4 个答案:

答案 0 :(得分:2)

作为一般建议:pandas库对于此类事情非常有用。如果我正确理解了您的问题,基本上应该这样做:

import os
import pandas as pd

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
csvout = pd.DataFrame()

for filename in files:
    data = pd.read_csv(filename, sep=':', index_col=0, header=None).T
        csvout = csvout.append(data)

csvout.to_csv(output)

有关代码的说明,请参见this question/answer,其中解释了如何使用大熊猫读取转置的文本文件。

答案 1 :(得分:0)

您可以使用csv模块将输入文件解析为字典,然后使用DictWriter将其写回:

import os
import csv

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
with open(output, 'w', newline='') as outfile:
    csvout = csv.DictWriter(outfile, fieldnames =
                ['market','mileStoneUpdates','woName',
                 'buildStage','designType','woOverwrite'])
    csvout.writeheader()
    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            csvin = csv.reader(afile, delimiter=':')
            csvout.writerow({ row[0]: row[1] for row in csvin})

答案 2 :(得分:0)

首先,对“ with ... as”语法进行说明:这旨在为您完成有关打开和关闭文件的所有工作,因此,当您离开“ with ... as”块时,您的文件将自动关闭。因此,您的行“ afile.close”是不必要的。另外,以后您将无法编辑输出文件,因为它已关闭。所以请记住这一点。

如果您正在寻找不需要任何其他库的解决方案(取决于您执行此操作的频率),如果,您的所有文件都完全相同格式:

import os
import csv

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft 
Excel Worksheet.csv'
outfile = open(output, 'w')
csvout = csv.writer(outfile)
csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)

for filename in files:
    with open(dirpath + '/' + filename) as afile:
        row=[] # list of values we will be constructing
        for line in afile: # loops through the lines in the file one by one
            value = line.split(':')[1].strip('" \n') # I will be explaining this later
            row.append(value) # adds the retrieved value to our row
        csvout.writerow(row)

outfile.close()

现在让我们看一下value = ...行中发生的情况:line.split(':')列出由':'分隔的字符串的列表。因此'"market":"billing, MI"\n'成为['"market"','"billing, MI"\n'] [1]占据了列表的第二项(记住,Python是零索引的),因为我们已经知道第一项(这是字段的名称)。 .strip(' "\n')从字符串的开头和结尾删除指定的字符(双引号,空格或换行符)。在某种程度上,它将“清理”字符串,以便仅保留实际值。

答案 3 :(得分:0)

几乎不需要更改:

  • 文件上的所有操作都必须在with子句内,并且在那里不需要关闭。
  • 然后您需要从文件收集数据。

最简单的解决方案是:

import os
import csv
from collections import OrderedDict

HEADERS = ['market', 'mileStoneUpdates', 'woName', 'buildStage', 'designType', 'woOverwrite']

dirpath = '/tmp/input'
output = '/tmp/output'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(HEADERS)
    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            data = OrderedDict.fromkeys(HEADERS, "")
            for line in afile:
                for header in HEADERS:
                    if line.startswith('"{}"'.format(header)):
                        value = line.split('"{}":"'.format(header)).pop()
                        value = value[:-2]
                        data[header] = value
            csvout.writerow(data.values())
            afile.close()

    outfile.close()

对于给定的输入文件:

"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"

"market":"billing, MI2"
"mileStoneUpdates":"N2"
"woName":"Dsca_55354_55as02"
"buildStage":"CPD2"
"designType":"Cor2e"
"woOverwrite":"Y2"

会产生:

market,mileStoneUpdates,woName,buildStage,designType,woOverwrite
"billing, MI",N,Dsca_55354_55as0,CPD,Core,Y
"billing, MI2",N2,Dsca_55354_55as02,CPD2,Cor2e,Y2

注意:如果文件中的数据更复杂,请使用regexp而不是简单的字符串拆分。