提取csv文件的第三列

时间:2019-06-28 19:33:18

标签: python arrays csv

00,1,000011110000111111110000
00,2,000011110000111111110000
00,3,000010000000111000000000
00,4,111110000111111000000111
00,5,111110000111111000000111
00,6,111110000111111000000111
00,7,111001111111111000000111
00,8,000001110000000000000111
00,9,000011110000000011111111
00,10,000011110000000011111111
00,11,000011110000000011111111
00,12,111111110000000011110000
00,13,111111110000000011110000
00,14,111111110000000011110000
00,15,111000000000000010000000
00,16,111000000111111110000111
00,17,111000000111111110000111
00,18,111000000111111110000111
00,19,111000000111111001111111
00,20,000000000111000001110000
00,21,000011111111000011110000
00,22,000011111111000011110000
00,23,000011111111000011110000
01,0,01111111000000
01,1,01111111000000
01,2,01111111000000
01,3,01110000000000
01,4,01110000000000
01,5,01110000000000
01,6,11110000000111
01,7,11110000000111
01,8,11110000000111
01,9,00000000000111
01,10,00000111111111
01,11,00000111111111
01,12,00000111111111
01,13,00000111000000
02,0,0000111100
02,1,0000111000
02,2,1111111001
02,3,0000000111
02,4,0000000111
02,5,0000000111
02,6,0010000100
02,7,0001000100
02,8,0000000100
02,9,0000000100
03,0,0111111111110000
03,1,0111111111110000
03,2,0111111111110000
03,3,0111111111110000
03,4,1110000000101111
03,5,1110000000011111
03,6,1110000000001111
03,7,1110000000001111
03,8,0001111000001111
03,9,0000000100001111
03,10,0000000100001111
03,11,0000000100001111
03,12,0000000100001111
03,13,0000000011111111
03,14,0000000011111111
03,15,0000000011110000
04,0,011111111110000
04,1,011111111110000
04,2,011111111110000
04,3,011111111110000
04,4,111000000001111
04,5,111000000001111
04,6,111000000001111
04,7,000111100001111
04,8,000000010001111
04,9,000000010001111
04,10,000000010001111
04,11,000000010001111
04,12,000000001111111
04,13,000000001111111
04,14,000000001110000
05,0,000001111111110000
05,1,000001111111110000
05,2,000001111111110000
05,3,000001111000000000
05,4,111111111000000111
05,5,111111111000000111
05,6,111111111000000111
05,7,111111111000000111
05,8,000000000000000111
05,9,000000000011111111
05,10,000000000011111111
05,11,000000000011111111
05,12,000000000011111111
05,13,111100000011110000
05,14,000010000011110000
05,15,000010000011110000
05,16,000010000011110000
05,17,000010000011110000
06,0,01111111111111000000
06,1,01111111111111000000
06,2,01111111111111000000
06,3,01111111111111000000
06,4,01111000000000000000
06,5,01111000000000000000
06,6,11111000000000001111
06,7,11111000000000001111
06,8,11111000000000001111
06,9,11111000000000001111
06,10,00000111100000001111
06,11,00000000010000001111
06,12,00000000010000001111
06,13,00000000010000001111
06,14,00000000010000001111
06,15,00000000001111111111
06,16,00000000001111111111
06,17,00000000001111111111
06,18,00000000001111111111
06,19,00000000001111000000
07,0,000001110000000
07,1,111111110000000
07,2,111111110000000
07,3,110000001110000
07,4,010000000001000
07,5,110000000001000
07,6,110000000001000
07,7,110000000000111
07,8,110000000000111
07,9,110000000000111
07,10,000000000000110
07,11,000000000000110
07,12,000000000000110
07,13,000001111111110
07,14,000001111101110
08,0,000111000111111000
08,1,000111000111111000
08,2,000100000110000000
08,3,111100011110000011
08,4,111100011110000011
08,5,110011111110000011
08,6,000011000000000011
08,7,000111000000111111
08,8,000111000000111111
08,9,111111000000111000
08,10,111111000000111000
08,11,110000000000100000
08,12,110000011111100011
08,13,110000011111100011
08,14,110000011110011111
08,15,000000011000011000
08,16,000111111000111000
08,17,000111111000111000
09,0,00011100000
09,1,00011100000
09,2,00011100000
09,3,11111100000
09,4,11100000000
09,5,11100001111
09,6,11100001111
09,7,11100001111
09,8,00011111000
09,9,00011111000
09,10,00011111000
10,0,1000111000000
10,1,1000111000000
10,2,1000111000000
10,3,0111111000000
10,4,0111111000000
10,5,0111000000000
10,6,1111000011111
10,7,1111000011111
10,8,1111000011111
10,9,0000111111000
10,10,0000111111000
10,11,0000111111000
10,12,0000111000111
11,0,00001111000001111111111000
11,1,00001111000001111111111000
11,2,00001100000001111000000000
11,3,11111100011111111000000011
11,4,11111100011111111000000011
11,5,11111100011111111000000011
11,6,11111100011111111000000011
11,7,11100011111111111000000011
11,8,00000011000000000000000011
11,9,00001111000000010011111111
11,10,00001111000000001011111111
11,11,00001111000000000011111111
11,12,00001111000000000011111111
11,13,11111111000000000011111000
11,14,11111111000000000011111000
11,15,11111111000000000011111000
11,16,11111111000000000011111000
11,17,11100000000000000010000000
11,18,11100000011111111110000011
11,19,11100000011111111110000011
11,20,11100000011111111001111111
11,21,11100000011111111001111111
11,22,00000000011110000001111000
11,23,00001111111110000011111000
11,24,00001111111110000011111000
11,25,00001111111110000011111000
12,0,000010000000
12,1,111110000000
12,2,111110000000
12,3,111110000000
12,4,111000000000
12,5,111000000000
12,6,111000000000
12,7,111000001111
12,8,000000001110
12,9,000011111110
12,10,000011111110
12,11,000011111110
13,0,0111111110000111111000000
13,1,0111111110000111111000000
13,2,0111111110000111111000000
13,3,0111000001111111111000000
13,4,0111000001111111111000000
13,5,0100000001110000000000000
13,6,1100001111110000000011111
13,7,1100001111110000000011111
13,8,1100001111110000000011111
13,9,0011111111110000000011111
13,10,0011111111110000000011111
13,11,0011111111110000000011111
13,12,0011000000000000000011000
13,13,0111000000000111111111000
13,14,0111000000000111111111000
13,15,0111000000000111111111000
13,16,1111000000000111111000111
13,17,1111000000000111111000111
13,18,1111000000000111111000111
13,19,0000000000000111000000111
13,20,0000000000000111000000111
13,21,0000001111111111000011111
13,22,0000001111111111000011111
13,23,0000001111110000111111111
13,24,0000001110000000111000000
14,0,01111110001111000
14,1,01111110001111000
14,2,01111001111111000
14,3,11100011100000111
14,4,11100011100000111
14,5,00011111101000111
14,6,00011111100100111
14,7,00011000000000100
14,8,01111000001111100
14,9,01111000001111100
14,10,11111000001111011
14,11,00000000001100011
14,12,00000011111100111
14,13,00000011111100111
14,14,00000011100011111
14,15,00000011100011111
14,16,00000010000011000
15,0,011111111100000111110000
15,1,011111111100000111110000
15,2,011111111100000111110000
15,3,011111111100000111110000
15,4,110000011111111000001111
15,5,110000011111111000001111
15,6,110000011111111000001111
15,7,001111111111111000001111
15,8,001111111111111000001111
15,9,010000000000000111110000
15,10,010000000001100111110000
15,11,010000000001010111110000
15,12,010000000000110111110000
15,13,010000000000000111110000
15,14,110000000000000111111111
15,15,110000000000000111111111
15,16,110000000000000111111111
15,17,001111000000000110001111
15,18,000000100000000110001111
15,19,000000100000000110001111
15,20,000000100000000110001111
15,21,000000100000000110001111
15,22,000000011111111001111111
15,23,000000011100000001110000
16,0,11110001110000
16,1,11110001110000
16,2,11101111110000
16,3,11101111110000
16,4,10011110001111
16,5,10011110001111
16,6,01111110001111
16,7,11100001111100
16,8,11100001111100
16,9,11100001111100
16,10,11100001110011
16,11,00011111001111
16,12,00011111001111
16,13,00011110111111
17,0,11111001111000
17,1,11111001111000
17,2,11110111111000
17,3,10001110000111
17,4,10001110000111
17,5,10001110000111
17,6,01111110000111
17,7,11110001111100
17,8,11110001111100
17,9,11110001111011
17,10,00001111000111
17,11,00001111000111
17,12,00001111000111
17,13,00001110111111
18,0,0000111111000
18,1,0000111111000
18,2,0000111111000
18,3,1111111000111
18,4,1111111000111
18,5,1111111000111
18,6,0000000111111
18,7,0000000111111
18,8,0000000111111
18,9,0010000111000
18,10,0001000111000
18,11,0000000111000
18,12,0000000111000
19,0,00011000
19,1,00011000
19,2,11011000
19,3,11000111
19,4,11000111
19,5,00000000
19,6,00011100
19,7,00011100
20,0,000011111110000
20,1,000011111110000
20,2,000011111110000
20,3,000011100000000
20,4,111111100000111
20,5,111111100000111
20,6,111111100000111
20,7,000000000000111
20,8,000000001111111
20,9,000000001111111
20,10,000000001111111

这是我的整个mon.csv文件。我想通过使用列0进行分组。 也就是说,对于00,第2列中的值将添加到数组中

[000011110000111111110000,000011110000111111110000,000010000000111000000000,111110000111111000000111,111110000111111000000111,111110000111111000000111,111001111111111000000111,000001110000000000000111,000011110000000011111111,000011110000000011111111,000011110000000011111111,111111110000000011110000,111111110000000011110000,111111110000000011110000,111000000000000010000000,111000000111111110000111,111000000111111110000111,111000000111111110000111,111000000111111001111111,000000000111000001110000,000011111111000011110000,000011111111000011110000,000011111111000011110000]

对于01,将第2列中的值添加到另一个数组中 对于02,将第2列中的值添加到另一个数组中,依此类推。 这是我在下面尝试过的:

def main():
    import csv
    from itertools import groupby

    with open("mon.txt") as file:
        reader = csv.reader(file)
        rows = [[row[0]] + [int(item) for item in row[1:]] for row in reader]

    groups = {}

    for key, group in groupby(rows, lambda row: row[0]):
        groups[key] = [row[2] for row in group]
    print(groups)
    return 0

if __name__ == "__main__":
    import sys
    sys.exit(main())

此代码输出错误,即第2列的值混合在一起。

2 个答案:

答案 0 :(得分:1)

使用csv.reader(使用默认的 dialect 替换 csv 文件)和coolections.defaultdict(使用类似字典的对象对值进行分组/累加)模块:

from collections import defaultdict
import csv
import pprint

with open('mon.txt') as f:
    groups = defaultdict(list)
    reader = csv.reader(f)
    for line in reader:
        groups[line[0]].append(line[2])

    pprint.pprint(dict(groups))

输出:

{'00': ['000011111111000011110000',
        '000011111111000011110000',
        '000011111111000011110000'],
 '01': ['01111111000000', '01111111000000', '01111111000000'],
 '02': ['0000111100', '0000111000', '1111111001', '0000000111', '0000000111'],
 '03': ['0111111111110000',
        '0111111111110000',
        '0111111111110000',
        '0111111111110000',
        '1110000000101111'],
 '04': ['011111111110000',
        '011111111110000',
        '011111111110000',
        '011111111110000']}

答案 1 :(得分:0)

您可以使用pandas。我添加了标头以将csv读取到pandas DataFrame中。我们在groupby列“ A”中使用apply将“ C”中的分组值组成一个列表。最后,我们使用to_dict()将此分组的DataFrame转换为字典。

In [53]: import pandas as pd

In [54]: df = pd.read_csv('test.csv')

In [55]: df.head()
Out[55]:
   A   B                     C
0  0  21  11111111000011110000
1  0  22  11111111000011110000
2  0  23  11111111000011110000
3  1   0         1111111000000
4  1   1         1111111000000

In [56]: df_raw = df.groupby('A')['C'].apply(list)

In [57]: df_raw.to_dict()
Out[57]:
{0: [11111111000011110000, 11111111000011110000, 11111111000011110000],
 1: [1111111000000, 1111111000000, 1111111000000],
 2: [111100, 111000, 1111111001, 111, 111],
 3: [111111111110000,
  111111111110000,
  111111111110000,
  111111111110000,
  1110000000101111],
 4: [11111111110000, 11111111110000, 11111111110000, 11111111110000]}