处理和格式化csv

时间:2015-11-12 01:23:52

标签: python csv

我在csv文件中有一系列数据如下。

   Hour L   Dr  Tag   L 0   1   2   3   4   5   6   7   8   9  10
    0   L5  XI  PS    4R    6   3   6   6   5   6   1   9   11  2       
    0   L5  XI  PS    4R    5   8   10  7   7   8   3   9   5   8   
    1   L0  St  v2T   4R    1   0   0   0   0   0   0   0   0   6
    1   L2  TI  sst   4R    8   8   8   8   8   8   8   8   8   8   

第一行表示列标题。标题为L的列右侧的数据将从0 - 59开始按顺序编号,其中只显示直到9的数据。

正如您所看到的,数据是根据小时列进行排序的,即hour 0 followed by hour1

我想将此更改为将hour rows添加为小时“1及以上”作为小时0行末尾的列。它应该搜索行0的标记字段,并在结尾处将60个值更新为新列。应更新列标题以表示hour.minute(例如0.0,0.1 .....,1.0,1.1 ......)

如果您为hour 0不存在的新标记进行了加密,则应添加该标记,并且只应更新该小时的60个值。所有其他值应设置为0'

我试图在python中执行上述操作。作为第一步,我试图检测小时是否有变化,一旦我这样做,我计划编写代码以小时n读取所有记录并将分钟值合并到右侧基于Tag.Is我的方法是否正确?或者有人可以提出更好的方法吗?

import csv
import os
import sys
from glob import glob

hour = 0
p_hour = -1
c_hour = -1
rownum = 0
row_header = []
file_list = []


def format_minute_field(row_header,hour):
    hdr_len = len(row_header)
    for i in range((hdr_len-60),hdr_len):
        row_header[i] = '{}:{}'.format(hour,row_header[i]) 
    return row_header

if __name__ == '__main__':
    fd = open('test.csv','rt')
    rownum = 0
    reader = csv.reader(fd)
    for row in reader:
        if rownum == 0:
            row_header = row
            row_header = format_minute_field(row_header,0)
            print('row_header {}'.format(row_header))
            rownum +=1
        else:
            if rownum == 1:
                previous_row = row
            if (row[0] != previous_row[0]) and (rownum > 1):
                print('hour changed from {} to {}'.format(previous_row[0],row[0]))
        previous_row = row
        rownum +=1

预期输出如下。如果小时0列表和小时1列表中存在特定tag,则应按如下方式记录值。如果在处理hour 1 recoreds时遇到新标记,则应将其附加到列表中。

 Hour L   Dr  Tag   L 0   0:1   0:2   0:3   0:4   0:5   0:6   0:7   0:8   0:9  0:10 ..........0:59 1:0 1:1 1:2 1:3 

1 个答案:

答案 0 :(得分:1)

仅仅因为我觉得它很有趣,我制作了一些代码。但是,您的测试数据与您解释的内容并不完全匹配,尤其是两个第一个数据行具有相同的标记,并且不清楚该怎么做。不过,这里有代码可以解决您的需求。希望它有所帮助。

我们的想法是使用DictReader和DictWriter来管理具有列名的单元格,而不是关心它们的读取/写入顺序,直到确实需要为止。然后,我使用data字典,帮助我根据任意键将行合并在一起,这是由某些特定单元格的值定义的,使用元组可以用作dicts中的键的事实。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv
import sys

# This will store the lines by tag, in order to join them
data = {}
# This will tell us how to order the columns, will be extended later
columns = ['L', 'Dr', 'Tag']
# Controls extension of the columns
maxhour = 0
# Controls the order the keys are found in the original CSV. Maybe not necessary
keys = []

with open("in.csv", 'r', newline='') as fin:
    reader = csv.DictReader(fin)
    for row in reader:
        hour = int(row['Hour'])
        # Form a unique key to match lines. Adjust to your needs
        key = (row['L'], row['Dr'], row['Tag'])
        if key not in data:
            # This is a future row, a dict with column as key, cell as value
            data[key] = {'L': row['L'], 'Dr': row['Dr'], 'Tag': row['Tag']}
            # Remember the order we've seen the keys
            keys.append(key)
        # Now, add data to the row for each minutes
        # 1 to 59
        for minute in range(1,60):
            # Copy data from column 'minute' to column 'Hour:minute'
            src_colname = str(minute)
            dest_colname = row['Hour'] + ':' + src_colname
            data[key][dest_colname] = row[src_colname]
        # There seems to be a special treatment for minute 0, at column "L 0"
        if hour == 0:
            data[key]['L 0'] = row['L 0']
        else:
            data[key][row['Hour'] + ':0'] = row['L 0']
        # Plan to generate enough columns when writing resulting file
        maxhour = max(maxhour, hour)

with open("out.csv", 'w', newline='') as fout:
    # Okay, now everything was merged into data
    # We need to tell DictWriter how to order columns
    # Treat special first column
    columns.append('L 0')
    # Then add the rest
    for hour in range(0, maxhour+1):
        # Do not include "0:0"
        for minute in range(0 if hour > 0 else 1,60):
            columns.append('{:d}:{:d}'.format(hour, minute))
    # Let's write that
    writer = csv.DictWriter(fout, columns, restval = "0")
    writer.writeheader()
    for key in keys: # or for key in data.keys(): if you don't mind the order
        writer.writerow(data[key])

测试数据in.csv

Hour,L,Dr,Tag,L 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,L5,XI,PS,4R,6,3,6,6,5,6,1,9,11,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L0,St,v2T,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L2,TI,sst,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,L5,XI,PS,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

输出:

L,Dr,Tag,L 0,0:1,0:2,0:3,0:4,0:5,0:6,0:7,0:8,0:9,0:10,0:11,0:12,0:13,0:14,0:15,0:16,0:17,0:18,0:19,0:20,0:21,0:22,0:23,0:24,0:25,0:26,0:27,0:28,0:29,0:30,0:31,0:32,0:33,0:34,0:35,0:36,0:37,0:38,0:39,0:40,0:41,0:42,0:43,0:44,0:45,0:46,0:47,0:48,0:49,0:50,0:51,0:52,0:53,0:54,0:55,0:56,0:57,0:58,0:59,1:0,1:1,1:2,1:3,1:4,1:5,1:6,1:7,1:8,1:9,1:10,1:11,1:12,1:13,1:14,1:15,1:16,1:17,1:18,1:19,1:20,1:21,1:22,1:23,1:24,1:25,1:26,1:27,1:28,1:29,1:30,1:31,1:32,1:33,1:34,1:35,1:36,1:37,1:38,1:39,1:40,1:41,1:42,1:43,1:44,1:45,1:46,1:47,1:48,1:49,1:50,1:51,1:52,1:53,1:54,1:55,1:56,1:57,1:58,1:59
L5,XI,PS,4R,5,8,10,7,7,8,3,9,5,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L0,St,v2T,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,1,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L2,TI,sst,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4R,8,8,8,8,8,8,8,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0