CSV解析和转换

时间:2014-08-12 14:01:13

标签: python parsing csv

我正在寻找解析一个csv文件,其组织如下:

<data1>,<data2>
asdf,<data3>
asdf,<data4>
asdf,<data5>
<data6>,<data7>
asdf,<data8>

<data1>,<data2>
asdf,<data3>
asdf,<data4>
asdf,<data5>
<data6>,<data7>
asdf,<data8>

<data1>,<data2>
asdf,<data3>
asdf,<data4>
asdf,<data5>
<data6>,<data7>
asdf,<data8>

etc.

我正在尝试输出看起来像这样的.csv:

<data1>,<data2>,<data3>,<data4>,<data6>,<data7>,<data8>
<data1>,<data2>,<data3>,<data4>,<data6>,<data7>,<data8>
etc.

有人可以帮我解决这个问题吗?

编辑:想出来,如果有人有兴趣..

import csv

with open('C:\Temp\eqtest.csv', 'rb') as inf, open('C:\Temp\output.csv', 'wb') as outf:
    reader = csv.reader(inf)
    writer = csv.writer(outf)
    i = -1
    line = []
    for row in reader:
        print(line)
        print(i)
        print(row)
        while row == ['','']:
            row = next(reader)
        i += 1
        if i == 0 or i == 4:
            line.append(row[0])
            line.append(row[1])
        elif i == 2 or i == 3:
            line.append(row[1])
        elif i == 5:
            line.append(row[1])
            i = -1
            writer.writerow(line)
            line = []

1 个答案:

答案 0 :(得分:1)

您可以将csv.reader()用作可迭代,并使用next()itertools.islice()获取其他行:

import csv
from itertools import islice

with open('input.csv', 'rb') as inf, open('output.csv', 'wb') as outf:
    reader = csv.reader(inf)
    writer = csv.writer(outf)
    for row in reader:
        while not row:
            # skip empty rows
            continue

        result = row
        for extra_row in islice(reader, 3):
            result.append(extra_row[1])
        result.extend(next(reader))
        result.append(next(reader)[1])

        writer.writerow(result)

这将从阅读器中获取一行,并使用所有列作为输出行的开头。然后从同一个CSV中再拉3行以获取第二列,将其添加到输出行。使用next(),将读取额外的两行,将整行和1列添加到输出中。

跳过每个6行块之前的任何空行。

然后读取输出,并且for循环的下一次迭代可以开始,此时已经读取了6个实际行,并且循环从输入文件中获取第7行;如果这是空的,则读取器会前进,直到找到非空行。

演示:

>>> import csv
>>> import sys
>>> from itertools import islice
>>> sample = '''\
... <data1>,<data2>
... asdf,<data3>
... asdf,<data4>
... asdf,<data5>
... <data6>,<data7>
... asdf,<data8>
... 
... <data1>,<data2>
... asdf,<data3>
... asdf,<data4>
... asdf,<data5>
... <data6>,<data7>
... asdf,<data8>
... 
... <data1>,<data2>
... asdf,<data3>
... asdf,<data4>
... asdf,<data5>
... <data6>,<data7>
... asdf,<data8>
... '''.splitlines()
>>> reader = csv.reader(sample)
>>> writer = csv.writer(sys.stdout)
>>> for row in reader:
...     while not row:
...         # skip empty rows
...         continue
...     result = row
...     for extra_row in islice(reader, 3):
...         result.append(extra_row[1])
...     result.extend(next(reader))
...     result.append(next(reader)[1])
...     writer.writerow(result)
... 
<data1>,<data2>,<data3>,<data4>,<data5>,<data6>,<data7>,<data8>
<data1>,<data2>,<data3>,<data4>,<data5>,<data6>,<data7>,<data8>
<data1>,<data2>,<data3>,<data4>,<data5>,<data6>,<data7>,<data8>