我有一个txt文件,可以显示为:
10 1:0.870137474304 2:0.722354071782 3:0.671913562758
11 1:0.764133072717 2:0.4893616821 3:0.332713609364
20 1:0.531732713984 2:0.0967819558321 3:0.169802773309
然后我想读取文件并以下列形式形成矩阵:
[[10 0.870137474304 0.722354071782 0.671913562758 ]
[11 0.764133072717 0.4893616821 0.332713609364 ]
[20 0.531732713984 0.0967819558321 0.169802773309]]
我知道如何拆分第一列以外的元素。如何处理第一栏?
matrix = []
lines = open("test.txt").read().split("\n") # read all lines into an array
for line in lines:
array [0] = line.split(" ")[0]
# Split the line based on spaces and the sub-part on the colon
array = [float(s.split(":")[1]) for s in line.split(" ")]
matrix.append(array)
print(matrix)
答案 0 :(得分:0)
您可以使用正则表达式:
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
输出:
[[10.0, 0.870137474304, 0.722354071782, 0.671913562758], [11.0, 0.764133072717, 0.4893616821, 0.332713609364], [20.0, 0.531732713984, 0.0967819558321, 0.169802773309]]
修改:使用numpy
创建data
数组:
import numpy as np
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
new_data = np.array(data)
输出:
array([[ 10. , 0.87013747, 0.72235407, 0.67191356],
[ 11. , 0.76413307, 0.48936168, 0.33271361],
[ 20. , 0.53173271, 0.09678196, 0.16980277]])
答案 1 :(得分:0)
以下是将数据提取为numpy
数组的一种方法:
df = pd.read_csv('myfile.csv', header=None)
for col in range(1, 4):
df[col] = df[col].apply(lambda x: float(x.split(':')[1]))
res = df.values
# [[ 10. 0.87013747 0.72235407 0.67191356]
# [ 11. 0.76413307 0.48936168 0.33271361]
# [ 20. 0.53173271 0.09678196 0.16980277]]
答案 2 :(得分:0)
对于python中的初学者
富有表现力的版本:
import csv
matrix = []
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
cleaned_row = [col.split(':')[-1] for col in row]
matrix.append(cleaned_row)
print matrix
使用列表理解
rows = [row for row in open('csvfile.csv').read().split('\n')]
matrix = [[col.split(':')[-1] for col in row.split(' ')] for row in rows]