将矩阵行中的字符串转换为包含行和列的矩阵,将字符串中的数字转换为整数

时间:2016-09-28 13:59:35

标签: python python-2.7 python-3.x csv matrix

我将excel的工作表保存为csv格式。在使用代码导入python中的数据后:

import csv
with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel')
    basis = []
    for row in reader:
        print(row)

输出:

['1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16']
['0.001;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363']
['0.002;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363;11.00127363']
['0.003;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283;10.94525283']

基本上它有16列和1399行。我意识到每一行都包含一个长字符串,然后我替换了所有';'使用','有希望有助于将字符串列转换为矩阵,我可以使用它来操作数据。现在我最终得到一个矩阵,或者更确切地说是一行包含所有字符串的列表。这就是我到目前为止在代码和输出方面的分别:

import csv
with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel')
    basis = []
    for row in reader:
        #print(row)

        for i in range(len(row)):
            new_row = (row[i].replace(';', ','))
            basis.append(new_row)

print(basis)


>> ['1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16', '0.001,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363', '0.002,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363', '0.003,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.004,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.005,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.006,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.007,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.008,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.009,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', '0.01,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283', ... , '1.396,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0', '1.397,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0', '1.398,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']

但这是我想要的形式,矩阵等于:

[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[0.001,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363],[0.002,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363,11.00127363], [0.003,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283,10.94525283]]

为了对数据进行操作

我非常感谢任何帮助。提前谢谢。

2 个答案:

答案 0 :(得分:2)

将分隔符更改为分号(默认为逗号,由于输入数据中包含分号,因此在此处不起作用)(我认为您可以省略dialect='excel'部分)

import csv

with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel',delimiter=";")
    basis = list(reader)

现在basis是包含数据为文本的行列表。

但是你想要它们作为整数/浮点数。所以你必须做更多的后处理:list comprehension转换为整数,如果它是一个整数(负整数也起作用),否则转换为float(当然,如果有字母数字行,则需要添加另一个测试,但不是这里的情况)

import csv,re
intre = re.compile(r"-?\d+$")

with open('45deg_marbles.csv', 'r') as f:
    reader = csv.reader(f,dialect='excel',delimiter=";")
    basis = []
    for row in reader:
        basis.append([int(x) if intre.match(x) else float(x) for x in row])

print(basis)

结果

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], [0.001, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363], [0.002, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363, 11.00127363], [0.003, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283, 10.94525283]]

请注意,如果保证整数为正数,则存在变量。保存正则表达式评估:

basis.append([int(x) if x.isdigit() else float(x) for x in row])

答案 1 :(得分:-2)

你需要做的是

for row in reader:
    basis.append(row.split(';'))

你做错了就是你替换';'使用逗号','这不会从字符串中生成列表,只是替换此字符串中的符号。您应该将字符串拆分为元素。