将CSV文件拆分为多个(非相同大小)文件,并保留标题

时间:2019-06-18 02:17:22

标签: python csv

我有一个大型的csv(test.csv),其中包含以下标题列id; type; name

和以下值:

1; A; ASW23
2; C; SDF92
3; D; SDI22
4; D; ASD00
5; C; WPE03
6; D; PPO30
7; A; WER34
8; C; FHH88
9; C; FGE45
10; A; DFQ12
11; G; WWQ89
12; C; YDT63
13; D; QTT21

该文件未排序,我希望每次找到A类型时都拆分一个CSV文件,并保留相同的标题。例如:

test_1.csv

id; type; name
1; A; ASW23
2; C; SDF92
3; D; SDI22
4; D; ASD00
5; C; WPE03
6; D; PPO30

test_2.csv

id; type; name
7; A; WER34
8; C; FHH88
9; C; FGE45

test_3.csv

id; type; name
10; A; DFQ12
11; G; WWQ89
12; C; YDT63
13; D; QTT21

我正在为此制作一个python脚本,但失败了。

2 个答案:

答案 0 :(得分:2)

您可以使用itertools.groupby

import itertools, csv
data = list(csv.reader(open('test.csv'), delimiter=';'))[1:]
new_d = [[a, list(b)] for a, b in itertools.groupby(data, key=lambda x:x[1]==' A')]
new_groups = [new_d[i][-1]+new_d[i+1][-1] for i in range(0, len(new_d), 2)]
for i, a in enumerate(new_groups, 1):
  with open('test_{}.csv'.format(i), 'w') as f:
    write = csv.writer(f, delimiter=';')
    write.writerows([['id', 'type', 'name']]+a)

test_1.csv

id;type;name
1; A; ASW23
2; C; SDF92
3; D; SDI22
4; D; ASD00
5; C; WPE03
6; D; PPO30

test_2.csv

id;type;name
7; A; WER34
8; C; FHH88
9; C; FGE45

test_3.csv

id;type;name
10; A; DFQ12
11; G; WWQ89
12; C; YDT63
13; D; QTT21

答案 1 :(得分:1)

使用熊猫的方法。

>>> df = pd.read_csv('test.csv', sep=';')
>>> df.columns = [col.strip() for col in df.columns]
>>> df['cutter'] = pd.np.where(df['type'].str.strip() == 'A', 1, 0).cumsum()
>>> df
    id type    name  cutter
0    1    A   ASW23       1
1    2    C   SDF92       1
2    3    D   SDI22       1
3    4    D   ASD00       1
4    5    C   WPE03       1
5    6    D   PPO30       1
6    7    A   WER34       2
7    8    C   FHH88       2
8    9    C   FGE45       2
9   10    A   DFQ12       3
10  11    G   WWQ89       3
11  12    C   YDT63       3
12  13    D   QTT21       3

>>> gb = df.groupby('cutter')
>>> for i, x in enumerate(gb.groups):
...     gb.get_group(x).to_csv(f'test_{i}.csv', index=False)
... 

结果

test_1.csv

   id type    name  cutter
0   1    A   ASW23       1
1   2    C   SDF92       1
2   3    D   SDI22       1
3   4    D   ASD00       1
4   5    C   WPE03       1
5   6    D   PPO30       1

test_2.csv

   id type    name  cutter
0   7    A   WER34       2
1   8    C   FHH88       2
2   9    C   FGE45       2

test_3.csv

   id type    name  cutter
0  10    A   DFQ12       3
1  11    G   WWQ89       3
2  12    C   YDT63       3
3  13    D   QTT21       3