将.text中的数据片段复制到电子表格的另一个文件中

时间:2014-08-27 13:55:14

标签: python export-to-csv

我在.txt文件中有一堆数据,我需要的格式可以在融合表/电子表格中使用。我假设该格式是一个csv,我可以写入另一个文件,然后我可以导入到电子表格中使用。

数据采用这种格式,多个条目用空行分隔。

Start Time
8/18/14, 11:59 AM
Duration
15 min
Start Side
Left
Fed on Both Sides
No

Start Time
8/18/14, 8:59 AM
Duration
13 min
Start Side
Right
Fed on Both Sides
No

(etc.)

但我最终需要这种格式(或者我可以用来将其放入电子表格中)

StartDate, StartTime, Duration, StartSide, FedOnBothSides
8/18/14, 11:59 AM, 15, Left, No
- ,      -,        -,  -,    -

我遇到的问题是:
- 我不需要所有信息或每行,但我不确定如何自动分开它们。我甚至都不知道我对排序每条线的方式是否聪明
- 我收到的错误是"参数1必须是字符串或只读字符缓冲区,而不是列表"当我有时使用.read()或.readlines()时(虽然它确实起作用)。我的两个论点都是.txt文件。
- 日期和时间不是常规长度的设定格式(它有8/4/14,上午5:14而不是08/04/14,上午05:14),我不是确定如何处理

这是我到目前为止所尝试的


from sys import argv
from os.path import exists

def filework():
    script, from_file, to_file = argv

    print "copying from %s to %s" % (from_file, to_file)

    in_file = open(from_file)
    indata = in_file.readlines() #.read() .readline .readlines .read().splitline .xreadlines

    print "the input file is %d bytes long" % len(indata)

    print "does the output file exist? %r" % exists(to_file)
    print "ready, hit RETURN to continue, CTRL-C to abort."
    raw_input()

    #do stuff section----------------BEGIN
    for i in indata:
        if i == "Start Time":
            pass #do something
        elif i== '{date format}':
            pass #do something
        else:
            pass #do something
        #do stuff section----------------END

    out_file = open(to_file, 'w')
    out_file.write(indata)

    print "alright, all done."

    out_file.close()
    in_file.close()



filework()

所以我在这样的脚本中相对没有多少复杂的部分。任何帮助和建议将不胜感激。对不起,如果这是一个混乱。
感谢

1 个答案:

答案 0 :(得分:0)

这段代码应该可行,虽然它不是最佳的,但我相信你会弄清楚如何让它变得更好! 这段代码基本上是做什么的:

  1. 从输入数据中获取所有行
  2. 遍历所有行,并尝试识别不同的键(开始时间等)
  3. 如果识别出键,请获取其下方的行,并对其应用适当的功能
    • 如果找到新行,请将当前条目添加到列表中,以便可以读取其他条目
  4. 将数据写入文件
  5. 如果您之前没有看到字符串格式化: "{0:} {1:}".format(arg0, arg1){0:}只是为变量定义占位符的一种方式(此处:arg0),0只定义要使用的参数。

    在此处了解更多信息:

    如果您使用的是python版本< 2.7,您可能必须使用pip install ordereddict安装其他版本的有序订单。如果这不起作用,只需将data = OrderedDict()更改为data = {}即可。但是每次生成时输出看起来会有所不同,但它仍然是正确的。

    from sys import argv
    from os.path import exists
    # since we want to have a somewhat standardized format
    # and dicts are unordered by default
    try:
        from collections import OrderedDict
    except ImportError:
        # python 2.6 or earlier, use backport
        from ordereddict import OrderedDict
    
    def get_time_and_date(time):
        date, time = time.split(",")
        time, time_indic = time.split()
    
        date = pad_time(date)
        time = "{0:} {1:}".format(pad_time(time), time_indic)
    
        return time, date
    """
       Make all the time values look the same, ex turn 5:30 AM into 05:30 AM
    """
    def pad_time(time):
        # if its time
        if ":" in time:
            separator = ":"
        # if its a date
        else:
            separator = "/"
    
        time = time.split(separator)
        for index, num in enumerate(time):
            if len(num) < 2:
                time[index] = "0" + time[index]
    
        return separator.join(time)
    
    def filework():
        from_file, to_file = argv[1:]
        data = OrderedDict() 
    
        print "copying from %s to %s" % (from_file, to_file)
        # by using open(...) the file closes automatically
        with open(from_file, "r") as inputfile:
            indata = inputfile.readlines()
            entries = []
    
            print "the input file is %d bytes long" % len(indata)
            print "does the output file exist? %r" % exists(to_file)
            print "ready, hit RETURN to continue, CTRL-C to abort."
            raw_input()
    
            for line_num in xrange(len(indata)):
                # make the entire string lowercase to be more flexible,
                # and then remove whitespace
                line_lowered = indata[line_num].lower().strip()
    
                if "start time" == line_lowered:
                    time, date = get_time_and_date(indata[line_num+1].strip())
                    data["StartTime"] = time
                    data["StartDate"] = date
                elif "duration" == line_lowered:
                    duration = indata[line_num+1].strip().split()
                    # only keep the amount of minutes
                    data["Duration"] = duration[0]
                elif "start side" == line_lowered:
                    data["StartSide"] = indata[line_num+1].strip()
                elif "fed on both sides" == line_lowered:
                    data["FedOnBothSides"] = indata[line_num+1].strip()
                elif line_lowered == "":
                    # if a blank line is found, prepare for reading a new entry
                    entries.append(data)
                    data = OrderedDict()
    
            entries.append(data)
    
        # create the outfile if it does not exist
        with open(to_file, "w+") as outfile:
            headers = entries[0].keys()
            outfile.write(", ".join(headers) + "\n")
            for entry in entries:
                outfile.write(", ".join(entry.values()) + "\n")
    
    filework()