我想做以下事情: 1)将14个文件加载到Numpy数组中 2)连接14个Numpy数组 3)根据连接数组中的文件提取每个Numpy数组开始和结束的行索引,以创建一个新的numpy数组,根据它所属的文件为每行数据分配1到14的类号。
我已经创建了以下代码来解决这个问题:
import numpy as np
from numpy import genfromtxt
from numpy import *
name1 = 'backandfforwwalk4smallstepsML'
name2 = 'backandfforwwalk4stepsML'
name3 = 'backandforsteps1lineML'
name4 = 'leftdnrightfarML'
name5 = 'sidestyletwoML'
name6 = 'walkingsideML'
name7 = 'fastwalkML'
class1 = genfromtxt('allSumRowSignals_'+ name1 + '_even.csv', delimiter=',')
class2 = genfromtxt('allSumRowSignals_'+ name1 + '_odd.csv', delimiter=',')
class3 = genfromtxt('allSumRowSignals_'+ name2 + '_even.csv', delimiter=',')
class4 = genfromtxt('allSumRowSignals_'+ name2 + '_odd.csv', delimiter=',')
class5 = genfromtxt('allSumRowSignals_'+ name3 + '_even.csv', delimiter=',')
class6 = genfromtxt('allSumRowSignals_'+ name3 + '_odd.csv', delimiter=',')
class7 = genfromtxt('allSumRowSignals_'+ name4 + '_even.csv', delimiter=',')
class8 = genfromtxt('allSumRowSignals_'+ name4 + '_odd.csv', delimiter=',')
class9 = genfromtxt('allSumRowSignals_'+ name5 + '_even.csv', delimiter=',')
class10 = genfromtxt('allSumRowSignals_'+ name5 + '_odd.csv', delimiter=',')
class11 = genfromtxt('allSumRowSignals_'+ name6 + '_even.csv', delimiter=',')
class12 = genfromtxt('allSumRowSignals_'+ name6 + '_odd.csv', delimiter=',')
class13 = genfromtxt('allSumRowSignals_'+ name7 + '_even.csv', delimiter=',')
class14 = genfromtxt('allSumRowSignals_'+ name7 + '_odd.csv', delimiter=',')
#Load files that have similar name
a = np.concatenate((class1,class2),axis=0)
b = np.concatenate((a,class3),axis=0)
c = np.concatenate((b,class4),axis=0)
d = np.concatenate((c,class5),axis=0)
e = np.concatenate((d,class6),axis=0)
f = np.concatenate((e,class7),axis=0)
g = np.concatenate((f,class8),axis=0)
h = np.concatenate((g,class9),axis=0)
i = np.concatenate((h,class10),axis=0)
j = np.concatenate((i,class11),axis=0)
k = np.concatenate((j,class12),axis=0)
l = np.concatenate((k,class13),axis=0)
m = np.concatenate((l,class14),axis=0)
#concatenate all of them, m is the concatenated file
#calculating the indexes for each class
class1ends = len(class1[:,1])
class2ends = len(a[:,1])
class3ends = len(b[:,1])
class4ends = len(c[:,1])
class5ends = len(d[:,1])
class6ends = len(e[:,1])
class7ends = len(f[:,1])
class8ends = len(g[:,1])
class9ends = len(h[:,1])
class10ends = len(i[:,1])
class11ends = len(j[:,1])
class12ends = len(k[:,1])
class13ends = len(l[:,1])
class14ends = len(m[:,1])
#is required to know in which row each of the files ends to assign a value number from 1 to 14 in a separate files, according to the number of files
Y = np.zeros((len(m)))
Y[0:class1ends+1]= 1
Y[class1ends:class2ends+1]= 2
Y[class2ends:class3ends+1]= 3
Y[class3ends:class4ends+1]= 4
Y[class4ends:class5ends+1]= 5
Y[class5ends:class6ends+1]= 6
Y[class6ends:class7ends+1]= 7
Y[class7ends:class8ends+1]= 8
Y[class8ends:class9ends+1]= 9
Y[class9ends:class10ends+1]= 10
Y[class10ends:class11ends+1]= 11
Y[class11ends:class12ends+1]= 12
Y[class12ends:class13ends+1]= 13
Y[class13ends:class14ends+1]= 14
#according to the previously saved indexes, creade a new variable with same length as m and assign a class number for each file
print class14ends
np.savetxt('y.csv', Y, delimiter=',', fmt="%s")
np.savetxt('X.csv', m, delimiter=',', fmt="%s")
#save classes as Y
#save data as X
我正在寻找一种更快,更紧凑,更通用的方法(很多文件)。有什么建议吗?
答案 0 :(得分:1)
您可以通过识别concatenate
获取多个数组的列表来简化此操作。
这是一个更简单的例子:
In [76]: class0=np.zeros((3,4))
In [77]: class1=np.ones((2,4))
In [78]: class2=np.ones((5,4))*2
In [79]: class3=np.ones((2,4))*3
In [80]: class_list=[class0,class1,class2,class3]
In [81]: lenlist=[x.shape[0] for x in class_list]
In [82]: m = np.concatenate(class_list, axis=0)
In [84]: m
Out[84]:
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 1., 1., 1., 1.],
...
[ 2., 2., 2., 2.],
[ 3., 3., 3., 3.],
[ 3., 3., 3., 3.]])
In [85]: lenlist
Out[85]: [3, 2, 5, 2]
In [87]: class_ends=np.cumsum(lenlist)
In [88]: class_ends
Out[88]: array([ 3, 5, 10, 12], dtype=int32)
In [91]: Y=np.repeat(range(len(lenlist)),lenlist)
In [92]: Y
Out[92]: array([0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 3, 3])