Question

我正在尝试获取由单列字符串组成的CSV输入文件，并将其转换为数组，以便分别对每个字符串运行一些操作。但是，一旦我导入CSV，结果数组的结构就不像我预期的那样。这是一个代码片段，说明了我的问题：

import csv

regular_array = ["this", "is", "a", "test"]

# Import a csv with the same elements as regular_array in separate rows
# Put it in an array
csv_doc = csv.reader(open('tester.csv', 'rb'), delimiter=",", quotechar='|')

csv_array = []

for row in csv_doc:
    csv_array.append(row)


# Let's see if we can pass an element from these arrays into a function
print len(regular_array[0]), "\n", len(csv_array[0])

# Well that doesn't do what I thought it would
# Let's print out the arrays and see why.

print regular_array[0], "\n", csv_array[0]

# AHA! My arrays have different structures.

正如您所料，由于数组的结构，我得到了两种操作的不同结果。第一个数组由字母组成，因此len（regular_array [0]）= 4.第二个数组由元素和len（csv_array [0]）= 1组成。

就我的目的而言，我需要我的阵列是第一种。

我的问题分为两部分： 1）有人能指出一些资源来帮助我了解我正在处理的现象吗？（对list / array / tuple结构之间的差异还不太满意）

2）我可以使用一种方法将我的CSV输入转换为第一种数组，还是有一种更好的方法可以在导入后存储数据？

提前致谢。

Answer 1

此代码生成一个字符串列表：

regular_array = ["this", "is", "a", "test"]

cvs文件的每一行也是一个字符串列表。当您迭代它们并将它们附加到cvs_array - 列表时 - 您将获得字符串列表的列表。像这样：

cvs_array = [['some', 'stuff'], ['some other', 'stuff']]

如果您想制作一个平面列表，例如regular_array，请使用extend代替append。

>>> list_of_lists = [['some', 'stuff'], ['some other', 'stuff']]
>>> cvs_array = []
>>> for l in list_of_lists:
...     cvs_array.append(l)
... 
>>> cvs_array
[['some', 'stuff'], ['some other', 'stuff']]
>>> cvs_array = []
>>> for l in list_of_lists:
...     cvs_array.extend(l)
... 
>>> cvs_array
['some', 'stuff', 'some other', 'stuff']

你也可以使用+=;这里的方法，至少在我的机器上看起来最快+=。但append方法要慢得多。这是一些时间安排。首先，定义：

>>> import csv
>>> def gen_csv_file(size):
...     with open('test.csv', 'wb') as csv_f:
...         csv_w = csv.writer(csv_f)
...         csv_w.writerows([['item {0} row {1}'.format(i, j) 
                              for i in range(size)] 
                              for j in range(size)])
... 
>>> def read_append(csv_file):
...     csv_list = []
...     for row in csv_file:
...         for item in row:
...             csv_list.append(item)
...     return csv_list
... 
>>> def read_concat(csv_file):
...     csv_list = []
...     for row in csv_file:
...         csv_list += row
...     return csv_list
... 
>>> def read_extend(csv_file):
...     csv_list = []
...     for row in csv_file:
...         csv_list.extend(row)
...     return csv_list
... 
>>> def read_csv(read_func):
...     with open('test.csv', 'rb') as csv_f:
...         csv_r = csv.reader(csv_f)
...         return read_func(csv_r)
...

结果：

read_append, file size: 10x10
10000 loops, best of 3: 59.4 us per loop
read_concat, file size: 10x10
10000 loops, best of 3: 47.8 us per loop
read_extend, file size: 10x10
10000 loops, best of 3: 48 us per loop
read_append, file size: 31x31
1000 loops, best of 3: 394 us per loop
read_concat, file size: 31x31
1000 loops, best of 3: 290 us per loop
read function: read_extend, file size: 31x31
1000 loops, best of 3: 291 us per loop
read function: read_append, file size: 100x100
100 loops, best of 3: 3.69 ms per loop
read function: read_concat, file size: 100x100
100 loops, best of 3: 2.67 ms per loop
read function: read_extend, file size: 100x100
100 loops, best of 3: 2.67 ms per loop
read function: read_append, file size: 316x316
10 loops, best of 3: 40.1 ms per loop
read function: read_concat, file size: 316x316
10 loops, best of 3: 29.9 ms per loop
read function: read_extend, file size: 316x316
10 loops, best of 3: 30 ms per loop
read function: read_append, file size: 1000x1000
1 loops, best of 3: 425 ms per loop
read function: read_concat, file size: 1000x1000
1 loops, best of 3: 325 ms per loop
read function: read_extend, file size: 1000x1000
1 loops, best of 3: 323 ms per loop

因此使用append始终较慢，使用extend几乎与使用+=相同。

Answer 2

csv.reader（）将每行作为列表返回，因此当您在csv文件的第一行上运行csv_array.append（row）时，您将添加列表['this']作为第一行csv_array的元素。 regular_array的第一个元素是一个字符串，而csv_array的第一个元素是一个列表。

要将csv文件中每行的“单元格”单独添加到csv_array，您可以执行以下操作：

for row in csv_doc:
     for cell in row:
          csv_array.append(cell)

Answer 3

将for循环中的代码更改为：

for row in csv_doc:
    csv_array += row

请参阅Python append() vs. + operator on lists, why do these give different results?了解+运算符与追加之间的区别。

在Python中处理CSV中的字符串输入

3 个答案: