我正在尝试获取由单列字符串组成的CSV输入文件,并将其转换为数组,以便分别对每个字符串运行一些操作。但是,一旦我导入CSV,结果数组的结构就不像我预期的那样。这是一个代码片段,说明了我的问题:
import csv
regular_array = ["this", "is", "a", "test"]
# Import a csv with the same elements as regular_array in separate rows
# Put it in an array
csv_doc = csv.reader(open('tester.csv', 'rb'), delimiter=",", quotechar='|')
csv_array = []
for row in csv_doc:
csv_array.append(row)
# Let's see if we can pass an element from these arrays into a function
print len(regular_array[0]), "\n", len(csv_array[0])
# Well that doesn't do what I thought it would
# Let's print out the arrays and see why.
print regular_array[0], "\n", csv_array[0]
# AHA! My arrays have different structures.
正如您所料,由于数组的结构,我得到了两种操作的不同结果。第一个数组由字母组成,因此len(regular_array [0])= 4.第二个数组由元素和len(csv_array [0])= 1组成。
就我的目的而言,我需要我的阵列是第一种。
我的问题分为两部分: 1)有人能指出一些资源来帮助我了解我正在处理的现象吗? (对list / array / tuple结构之间的差异还不太满意)
2)我可以使用一种方法将我的CSV输入转换为第一种数组,还是有一种更好的方法可以在导入后存储数据?
提前致谢。
答案 0 :(得分:2)
此代码生成一个字符串列表:
regular_array = ["this", "is", "a", "test"]
cvs文件的每一行也是一个字符串列表。当您迭代它们并将它们附加到cvs_array
- 列表时 - 您将获得字符串列表的列表。像这样:
cvs_array = [['some', 'stuff'], ['some other', 'stuff']]
如果您想制作一个平面列表,例如regular_array
,请使用extend
代替append
。
>>> list_of_lists = [['some', 'stuff'], ['some other', 'stuff']]
>>> cvs_array = []
>>> for l in list_of_lists:
... cvs_array.append(l)
...
>>> cvs_array
[['some', 'stuff'], ['some other', 'stuff']]
>>> cvs_array = []
>>> for l in list_of_lists:
... cvs_array.extend(l)
...
>>> cvs_array
['some', 'stuff', 'some other', 'stuff']
你也可以使用+=
;这里的方法,至少在我的机器上看起来最快+=
。但append
方法要慢得多。这是一些时间安排。首先,定义:
>>> import csv
>>> def gen_csv_file(size):
... with open('test.csv', 'wb') as csv_f:
... csv_w = csv.writer(csv_f)
... csv_w.writerows([['item {0} row {1}'.format(i, j)
for i in range(size)]
for j in range(size)])
...
>>> def read_append(csv_file):
... csv_list = []
... for row in csv_file:
... for item in row:
... csv_list.append(item)
... return csv_list
...
>>> def read_concat(csv_file):
... csv_list = []
... for row in csv_file:
... csv_list += row
... return csv_list
...
>>> def read_extend(csv_file):
... csv_list = []
... for row in csv_file:
... csv_list.extend(row)
... return csv_list
...
>>> def read_csv(read_func):
... with open('test.csv', 'rb') as csv_f:
... csv_r = csv.reader(csv_f)
... return read_func(csv_r)
...
结果:
read_append, file size: 10x10
10000 loops, best of 3: 59.4 us per loop
read_concat, file size: 10x10
10000 loops, best of 3: 47.8 us per loop
read_extend, file size: 10x10
10000 loops, best of 3: 48 us per loop
read_append, file size: 31x31
1000 loops, best of 3: 394 us per loop
read_concat, file size: 31x31
1000 loops, best of 3: 290 us per loop
read function: read_extend, file size: 31x31
1000 loops, best of 3: 291 us per loop
read function: read_append, file size: 100x100
100 loops, best of 3: 3.69 ms per loop
read function: read_concat, file size: 100x100
100 loops, best of 3: 2.67 ms per loop
read function: read_extend, file size: 100x100
100 loops, best of 3: 2.67 ms per loop
read function: read_append, file size: 316x316
10 loops, best of 3: 40.1 ms per loop
read function: read_concat, file size: 316x316
10 loops, best of 3: 29.9 ms per loop
read function: read_extend, file size: 316x316
10 loops, best of 3: 30 ms per loop
read function: read_append, file size: 1000x1000
1 loops, best of 3: 425 ms per loop
read function: read_concat, file size: 1000x1000
1 loops, best of 3: 325 ms per loop
read function: read_extend, file size: 1000x1000
1 loops, best of 3: 323 ms per loop
因此使用append
始终较慢,使用extend
几乎与使用+=
相同。
答案 1 :(得分:1)
csv.reader()将每行作为列表返回,因此当您在csv文件的第一行上运行csv_array.append(row)时,您将添加列表['this']作为第一行csv_array的元素。 regular_array的第一个元素是一个字符串,而csv_array的第一个元素是一个列表。
要将csv文件中每行的“单元格”单独添加到csv_array,您可以执行以下操作:
for row in csv_doc:
for cell in row:
csv_array.append(cell)
答案 2 :(得分:1)
将for循环中的代码更改为:
for row in csv_doc:
csv_array += row
请参阅Python append() vs. + operator on lists, why do these give different results?了解+运算符与追加之间的区别。