在Python中处理CSV中的字符串输入

时间:2012-02-28 04:03:36

标签: python

我正在尝试获取由单列字符串组成的CSV输入文件,并将其转换为数组,以便分别对每个字符串运行一些操作。但是,一旦我导入CSV,结果数组的结构就不像我预期的那样。这是一个代码片段,说明了我的问题:

import csv

regular_array = ["this", "is", "a", "test"]

# Import a csv with the same elements as regular_array in separate rows
# Put it in an array
csv_doc = csv.reader(open('tester.csv', 'rb'), delimiter=",", quotechar='|')

csv_array = []

for row in csv_doc:
    csv_array.append(row)


# Let's see if we can pass an element from these arrays into a function
print len(regular_array[0]), "\n", len(csv_array[0])

# Well that doesn't do what I thought it would
# Let's print out the arrays and see why.

print regular_array[0], "\n", csv_array[0]

# AHA! My arrays have different structures.

正如您所料,由于数组的结构,我得到了两种操作的不同结果。第一个数组由字母组成,因此len(regular_array [0])= 4.第二个数组由元素和len(csv_array [0])= 1组成。

就我的目的而言,我需要我的阵列是第一种。

我的问题分为两部分: 1)有人能指出一些资源来帮助我了解我正在处理的现象吗? (对list / array / tuple结构之间的差异还不太满意)

2)我可以使用一种方法将我的CSV输入转换为第一种数组,还是有一种更好的方法可以在导入后存储数据?

提前致谢。

3 个答案:

答案 0 :(得分:2)

此代码生成一个字符串列表:

regular_array = ["this", "is", "a", "test"]

cvs文件的每一行也是一个字符串列表。当您迭代它们并将它们附加到cvs_array - 列表时 - 您将获得字符串列表的列表。像这样:

cvs_array = [['some', 'stuff'], ['some other', 'stuff']]

如果您想制作一个平面列表,例如regular_array,请使用extend代替append

>>> list_of_lists = [['some', 'stuff'], ['some other', 'stuff']]
>>> cvs_array = []
>>> for l in list_of_lists:
...     cvs_array.append(l)
... 
>>> cvs_array
[['some', 'stuff'], ['some other', 'stuff']]
>>> cvs_array = []
>>> for l in list_of_lists:
...     cvs_array.extend(l)
... 
>>> cvs_array
['some', 'stuff', 'some other', 'stuff']

你也可以使用+=;这里的方法,至少在我的机器上看起来最快+=。但append方法要慢得多。这是一些时间安排。首先,定义:

>>> import csv
>>> def gen_csv_file(size):
...     with open('test.csv', 'wb') as csv_f:
...         csv_w = csv.writer(csv_f)
...         csv_w.writerows([['item {0} row {1}'.format(i, j) 
                              for i in range(size)] 
                              for j in range(size)])
... 
>>> def read_append(csv_file):
...     csv_list = []
...     for row in csv_file:
...         for item in row:
...             csv_list.append(item)
...     return csv_list
... 
>>> def read_concat(csv_file):
...     csv_list = []
...     for row in csv_file:
...         csv_list += row
...     return csv_list
... 
>>> def read_extend(csv_file):
...     csv_list = []
...     for row in csv_file:
...         csv_list.extend(row)
...     return csv_list
... 
>>> def read_csv(read_func):
...     with open('test.csv', 'rb') as csv_f:
...         csv_r = csv.reader(csv_f)
...         return read_func(csv_r)
... 

结果:

read_append, file size: 10x10
10000 loops, best of 3: 59.4 us per loop
read_concat, file size: 10x10
10000 loops, best of 3: 47.8 us per loop
read_extend, file size: 10x10
10000 loops, best of 3: 48 us per loop
read_append, file size: 31x31
1000 loops, best of 3: 394 us per loop
read_concat, file size: 31x31
1000 loops, best of 3: 290 us per loop
read function: read_extend, file size: 31x31
1000 loops, best of 3: 291 us per loop
read function: read_append, file size: 100x100
100 loops, best of 3: 3.69 ms per loop
read function: read_concat, file size: 100x100
100 loops, best of 3: 2.67 ms per loop
read function: read_extend, file size: 100x100
100 loops, best of 3: 2.67 ms per loop
read function: read_append, file size: 316x316
10 loops, best of 3: 40.1 ms per loop
read function: read_concat, file size: 316x316
10 loops, best of 3: 29.9 ms per loop
read function: read_extend, file size: 316x316
10 loops, best of 3: 30 ms per loop
read function: read_append, file size: 1000x1000
1 loops, best of 3: 425 ms per loop
read function: read_concat, file size: 1000x1000
1 loops, best of 3: 325 ms per loop
read function: read_extend, file size: 1000x1000
1 loops, best of 3: 323 ms per loop

因此使用append始终较慢,使用extend几乎与使用+=相同。

答案 1 :(得分:1)

csv.reader()将每行作为列表返回,因此当您在csv文件的第一行上运行csv_array.append(row)时,您将添加列表['this']作为第一行csv_array的元素。 regular_array的第一个元素是一个字符串,而csv_array的第一个元素是一个列表。

要将csv文件中每行的“单元格”单独添加到csv_array,您可以执行以下操作:

for row in csv_doc:
     for cell in row:
          csv_array.append(cell)

答案 2 :(得分:1)

将for循环中的代码更改为:

for row in csv_doc:
    csv_array += row

请参阅Python append() vs. + operator on lists, why do these give different results?了解+运算符与追加之间的区别。