我有一个大型的csv(test.csv),其中包含以下标题列id; type; name
和以下值:
1; A; ASW23
2; C; SDF92
3; D; SDI22
4; D; ASD00
5; C; WPE03
6; D; PPO30
7; A; WER34
8; C; FHH88
9; C; FGE45
10; A; DFQ12
11; G; WWQ89
12; C; YDT63
13; D; QTT21
该文件未排序,我希望每次找到A类型时都拆分一个CSV文件,并保留相同的标题。例如:
test_1.csv
id; type; name
1; A; ASW23
2; C; SDF92
3; D; SDI22
4; D; ASD00
5; C; WPE03
6; D; PPO30
test_2.csv
id; type; name
7; A; WER34
8; C; FHH88
9; C; FGE45
test_3.csv
id; type; name
10; A; DFQ12
11; G; WWQ89
12; C; YDT63
13; D; QTT21
我正在为此制作一个python脚本,但失败了。
答案 0 :(得分:2)
您可以使用itertools.groupby
:
import itertools, csv
data = list(csv.reader(open('test.csv'), delimiter=';'))[1:]
new_d = [[a, list(b)] for a, b in itertools.groupby(data, key=lambda x:x[1]==' A')]
new_groups = [new_d[i][-1]+new_d[i+1][-1] for i in range(0, len(new_d), 2)]
for i, a in enumerate(new_groups, 1):
with open('test_{}.csv'.format(i), 'w') as f:
write = csv.writer(f, delimiter=';')
write.writerows([['id', 'type', 'name']]+a)
test_1.csv
:
id;type;name
1; A; ASW23
2; C; SDF92
3; D; SDI22
4; D; ASD00
5; C; WPE03
6; D; PPO30
test_2.csv
:
id;type;name
7; A; WER34
8; C; FHH88
9; C; FGE45
test_3.csv
:
id;type;name
10; A; DFQ12
11; G; WWQ89
12; C; YDT63
13; D; QTT21
答案 1 :(得分:1)
使用熊猫的方法。
>>> df = pd.read_csv('test.csv', sep=';')
>>> df.columns = [col.strip() for col in df.columns]
>>> df['cutter'] = pd.np.where(df['type'].str.strip() == 'A', 1, 0).cumsum()
>>> df
id type name cutter
0 1 A ASW23 1
1 2 C SDF92 1
2 3 D SDI22 1
3 4 D ASD00 1
4 5 C WPE03 1
5 6 D PPO30 1
6 7 A WER34 2
7 8 C FHH88 2
8 9 C FGE45 2
9 10 A DFQ12 3
10 11 G WWQ89 3
11 12 C YDT63 3
12 13 D QTT21 3
>>> gb = df.groupby('cutter')
>>> for i, x in enumerate(gb.groups):
... gb.get_group(x).to_csv(f'test_{i}.csv', index=False)
...
结果
test_1.csv
id type name cutter
0 1 A ASW23 1
1 2 C SDF92 1
2 3 D SDI22 1
3 4 D ASD00 1
4 5 C WPE03 1
5 6 D PPO30 1
test_2.csv
id type name cutter
0 7 A WER34 2
1 8 C FHH88 2
2 9 C FGE45 2
test_3.csv
id type name cutter
0 10 A DFQ12 3
1 11 G WWQ89 3
2 12 C YDT63 3
3 13 D QTT21 3