我在.txt文件中有一堆数据,我需要的格式可以在融合表/电子表格中使用。我假设该格式是一个csv,我可以写入另一个文件,然后我可以导入到电子表格中使用。
数据采用这种格式,多个条目用空行分隔。
Start Time
8/18/14, 11:59 AM
Duration
15 min
Start Side
Left
Fed on Both Sides
No
Start Time
8/18/14, 8:59 AM
Duration
13 min
Start Side
Right
Fed on Both Sides
No
(etc.)
但我最终需要这种格式(或者我可以用来将其放入电子表格中)
StartDate, StartTime, Duration, StartSide, FedOnBothSides
8/18/14, 11:59 AM, 15, Left, No
- , -, -, -, -
我遇到的问题是:
- 我不需要所有信息或每行,但我不确定如何自动分开它们。我甚至都不知道我对排序每条线的方式是否聪明
- 我收到的错误是"参数1必须是字符串或只读字符缓冲区,而不是列表"当我有时使用.read()或.readlines()时(虽然它确实起作用)。我的两个论点都是.txt文件。
- 日期和时间不是常规长度的设定格式(它有8/4/14,上午5:14而不是08/04/14,上午05:14),我不是确定如何处理
这是我到目前为止所尝试的
from sys import argv
from os.path import exists
def filework():
script, from_file, to_file = argv
print "copying from %s to %s" % (from_file, to_file)
in_file = open(from_file)
indata = in_file.readlines() #.read() .readline .readlines .read().splitline .xreadlines
print "the input file is %d bytes long" % len(indata)
print "does the output file exist? %r" % exists(to_file)
print "ready, hit RETURN to continue, CTRL-C to abort."
raw_input()
#do stuff section----------------BEGIN
for i in indata:
if i == "Start Time":
pass #do something
elif i== '{date format}':
pass #do something
else:
pass #do something
#do stuff section----------------END
out_file = open(to_file, 'w')
out_file.write(indata)
print "alright, all done."
out_file.close()
in_file.close()
filework()
所以我在这样的脚本中相对没有多少复杂的部分。任何帮助和建议将不胜感激。对不起,如果这是一个混乱。
感谢
答案 0 :(得分:0)
这段代码应该可行,虽然它不是最佳的,但我相信你会弄清楚如何让它变得更好! 这段代码基本上是做什么的:
如果您之前没有看到字符串格式化:
"{0:} {1:}".format(arg0, arg1)
,{0:}
只是为变量定义占位符的一种方式(此处:arg0
),0只定义要使用的参数。
在此处了解更多信息:
如果您使用的是python版本< 2.7,您可能必须使用pip install ordereddict
安装其他版本的有序订单。如果这不起作用,只需将data = OrderedDict()
更改为data = {}
即可。但是每次生成时输出看起来会有所不同,但它仍然是正确的。
from sys import argv
from os.path import exists
# since we want to have a somewhat standardized format
# and dicts are unordered by default
try:
from collections import OrderedDict
except ImportError:
# python 2.6 or earlier, use backport
from ordereddict import OrderedDict
def get_time_and_date(time):
date, time = time.split(",")
time, time_indic = time.split()
date = pad_time(date)
time = "{0:} {1:}".format(pad_time(time), time_indic)
return time, date
"""
Make all the time values look the same, ex turn 5:30 AM into 05:30 AM
"""
def pad_time(time):
# if its time
if ":" in time:
separator = ":"
# if its a date
else:
separator = "/"
time = time.split(separator)
for index, num in enumerate(time):
if len(num) < 2:
time[index] = "0" + time[index]
return separator.join(time)
def filework():
from_file, to_file = argv[1:]
data = OrderedDict()
print "copying from %s to %s" % (from_file, to_file)
# by using open(...) the file closes automatically
with open(from_file, "r") as inputfile:
indata = inputfile.readlines()
entries = []
print "the input file is %d bytes long" % len(indata)
print "does the output file exist? %r" % exists(to_file)
print "ready, hit RETURN to continue, CTRL-C to abort."
raw_input()
for line_num in xrange(len(indata)):
# make the entire string lowercase to be more flexible,
# and then remove whitespace
line_lowered = indata[line_num].lower().strip()
if "start time" == line_lowered:
time, date = get_time_and_date(indata[line_num+1].strip())
data["StartTime"] = time
data["StartDate"] = date
elif "duration" == line_lowered:
duration = indata[line_num+1].strip().split()
# only keep the amount of minutes
data["Duration"] = duration[0]
elif "start side" == line_lowered:
data["StartSide"] = indata[line_num+1].strip()
elif "fed on both sides" == line_lowered:
data["FedOnBothSides"] = indata[line_num+1].strip()
elif line_lowered == "":
# if a blank line is found, prepare for reading a new entry
entries.append(data)
data = OrderedDict()
entries.append(data)
# create the outfile if it does not exist
with open(to_file, "w+") as outfile:
headers = entries[0].keys()
outfile.write(", ".join(headers) + "\n")
for entry in entries:
outfile.write(", ".join(entry.values()) + "\n")
filework()