我有许多非常特定格式的文本文件,需要将其读入csv。我似乎无法弄清楚如何以我想要的csv格式获取所有数据。我可以获取工作表的文件名和标题,但是工作表中没有任何数据处于活动状态。文本文件是一个s:
"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"
我的代码:
import os
import csv
dirpath = 'C:\Usersnput\\'
output = 'C:\Users\gputew Microsoft Excel Worksheet.csv'
with open(output, 'w') as outfile:
csvout = csv.writer(outfile)
csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
csvout.writerow([filename, afile.read()])
afile.close()
outfile.close()
需要具有标题的电子表格;市场,mileStoneUpdates,woName,buildstage,designType,woOverwrite 每个文本文件中填充了Billing ... ect的单元格。
答案 0 :(得分:2)
作为一般建议:pandas库对于此类事情非常有用。如果我正确理解了您的问题,基本上应该这样做:
import os
import pandas as pd
dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
csvout = pd.DataFrame()
for filename in files:
data = pd.read_csv(filename, sep=':', index_col=0, header=None).T
csvout = csvout.append(data)
csvout.to_csv(output)
有关代码的说明,请参见this question/answer,其中解释了如何使用大熊猫读取转置的文本文件。
答案 1 :(得分:0)
您可以使用csv模块将输入文件解析为字典,然后使用DictWriter将其写回:
import os
import csv
dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
with open(output, 'w', newline='') as outfile:
csvout = csv.DictWriter(outfile, fieldnames =
['market','mileStoneUpdates','woName',
'buildStage','designType','woOverwrite'])
csvout.writeheader()
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
csvin = csv.reader(afile, delimiter=':')
csvout.writerow({ row[0]: row[1] for row in csvin})
答案 2 :(得分:0)
首先,对“ with ... as”语法进行说明:这旨在为您完成有关打开和关闭文件的所有工作,因此,当您离开“ with ... as”块时,您的文件将自动关闭。因此,您的行“ afile.close”是不必要的。另外,以后您将无法编辑输出文件,因为它已关闭。所以请记住这一点。
如果您正在寻找不需要任何其他库的解决方案(取决于您执行此操作的频率),如果,您的所有文件都完全相同格式:
import os
import csv
dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft
Excel Worksheet.csv'
outfile = open(output, 'w')
csvout = csv.writer(outfile)
csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
row=[] # list of values we will be constructing
for line in afile: # loops through the lines in the file one by one
value = line.split(':')[1].strip('" \n') # I will be explaining this later
row.append(value) # adds the retrieved value to our row
csvout.writerow(row)
outfile.close()
现在让我们看一下value = ...
行中发生的情况:line.split(':')
列出由':'
分隔的字符串的列表。因此'"market":"billing, MI"\n'
成为['"market"','"billing, MI"\n']
[1]
占据了列表的第二项(记住,Python是零索引的),因为我们已经知道第一项(这是字段的名称)。 .strip(' "\n')
从字符串的开头和结尾删除指定的字符(双引号,空格或换行符)。在某种程度上,它将“清理”字符串,以便仅保留实际值。
答案 3 :(得分:0)
几乎不需要更改:
最简单的解决方案是:
import os
import csv
from collections import OrderedDict
HEADERS = ['market', 'mileStoneUpdates', 'woName', 'buildStage', 'designType', 'woOverwrite']
dirpath = '/tmp/input'
output = '/tmp/output'
with open(output, 'w') as outfile:
csvout = csv.writer(outfile)
csvout.writerow(HEADERS)
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
data = OrderedDict.fromkeys(HEADERS, "")
for line in afile:
for header in HEADERS:
if line.startswith('"{}"'.format(header)):
value = line.split('"{}":"'.format(header)).pop()
value = value[:-2]
data[header] = value
csvout.writerow(data.values())
afile.close()
outfile.close()
对于给定的输入文件:
"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"
"market":"billing, MI2"
"mileStoneUpdates":"N2"
"woName":"Dsca_55354_55as02"
"buildStage":"CPD2"
"designType":"Cor2e"
"woOverwrite":"Y2"
会产生:
market,mileStoneUpdates,woName,buildStage,designType,woOverwrite
"billing, MI",N,Dsca_55354_55as0,CPD,Core,Y
"billing, MI2",N2,Dsca_55354_55as02,CPD2,Cor2e,Y2
注意:如果文件中的数据更复杂,请使用regexp而不是简单的字符串拆分。