Question

我是这个网站上的新手，希望我正确地提出我的问题，如果没有，请提出任何建议。

只需要一些问题的指导。

我有一个像这样的csv文件：

| Column1|
----------
abc
def
ghi
12,34
32,67
jkl
mno
pqr
28,34
98,67

（非常糟糕的文件）

我想转变成这种csv：

Something1 | Something2 | Something3 | Something4 | Something5
---------------------------------------------------------------
   abc     |    def     |     ghi    |    12,34   |    32,67
   jkl     |    mno     |     pqr    |    28,34   |    98,67

在可变重复次数的15个连续行中总共有15种类型的数据，我需要在新的csv文件中将其拆分为15列。

我的第一个尝试是创建一个bash脚本，其中包含一个函数来计算行数并将数据按行数拆分成一个新的csv文件但我意识到这样的事实可能更好地使用另一种方式或使用一些pythonic方式（使用pandas和numpy）或PhP网络服务（fopen和爆炸性数据或类似的东西），因为这不是我最后一次获得那种垃圾csv文件......

但我需要一些指导才能开始。

一些帮助将受到赞赏。

Answer 1

这个怎么样：

numCol = 15

columns = [["col" + i] for i in xrange(numCol)]

with open("...") as f:
    for (i, line) in enumerate(f[1:]):
        columns[i % numCol].append(line.rstrip())

csv = zip(*columns)

Answer 2

Pandas通常是处理csv数据的好方法。作为如何转换为pandas DataFrame的示例：

f = open("yourfile", "r").readlines()  # Your file

# Split into groups
from collections import defaultdict
import itertools
import pandas as pd

cols = itertools.cycle(range(5))  # Use appropriate names for columns here

# Add your data to your column names in a cycle
d = defaultdict(list)
for i in f[2:]:
    d[next(cols)].append(i)

print pd.DataFrame.from_dict(d)



>>>      0    1    2      3      4
0  abc  def  ghi  12,34  32,67
1  jkl  mno  pqr  28,34  98,67

Answer 3

假设输入和输出文件仅包含问题中显示的数据：

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

with open('trash.csv', 'r') as infile, open('pretty.csv', 'w') as outfile:
    next(infile)  # skip input header
    outfile.write('Something1|Something2|Something3|Something4|Something5\n') # new header
    for group in grouper(5, (line.strip() for line in infile)):
        #print('|'.join(group))
        outfile.write('|'.join(group)+'\n')

Answer 4

此解决方案仅使用标准库：

from csv import writer

COLUMNS = 15

with open("input_file.csv", "r") as input:
    with open("output_file.csv", "w") as f:
        output = writer(f, delimiter=";")
        output.writerow(["Col {}".format(i+1) for i in xrange(COLUMNS)])
        buffer = []
        for row in input:
            buffer.append(row)
            if len(buffer) == COLUMNS:
                output.writerow(buffer)
                del buffer[:]
        // You may want to check if there is something inside buffer at the end, for example if it has 23 rows buffer here will contain 8 elements and you may want to append them with: output.writerow(buffer)

按行数

4 个答案: