Question

以下循环的每次迭代都会生成一个维度为50x1的向量我喜欢将来自循环的所有向量集中存储在单个数据结构中。

  def get_y_hat(y_bar, x_train, theta_Ridge_Matrix):
     print theta_Ridge_Matrix.shape
     print theta_Ridge_Matrix.shape[0]
     for i in range(theta_Ridge_Matrix.shape[0]):
        yH = np.dot(x_train, theta_Ridge_Matrix[i].T)
        print yH

我应该使用哪种数据结构？我是Python的新手，但根据我在网上研究的内容，有两个选项：numpy数组和列表列表

我需要在此方法之外访问50个元素的每个向量。我将存储200到500个矢量。

有人可以给我这样的数据结构的示例代码

由于

Answer 1

我认为将循环中的数据存储在dict中而不是将其转换为pandas.Dataframe（构建在numpy数组之上）应该是一种有效的解决方案，允许您进一步处理您的数据整体或单个向量。

举个例子：

import pandas as pd
import numpy as np

data = {}
# this would be your loop
for i in range(50):
    data['run_%02d' % i] = np.random.randn(50)
data = pd.DataFrame(data) # sorted keys of the dict will be the columns

您可以将单个向量作为属性或通过键访问：

print data['run_42'].describe() # or data.run_42.describe()

count    50.000000
mean      0.021426
std       1.027607
min      -2.472225
25%      -0.601868
50%       0.014949
75%       0.641488
max       2.391289

或进一步分析整个数据：

print data.mean()

run_00   -0.015224
run_01   -0.006971
..
run_48   -0.115935
run_49    0.147738

或使用matplotlib查看您的数据（因为您使用matplotlib标记了您的问题）：

data.boxplot(rot=90) 
plt.tight_layout()

example_boxplot

Answer 2

我无法评论numpy数组，因为我之前没有使用过，但使用Python已经内置支持的列表列表。

例如：

AList = [1, 2, 3]
BList = [4, 5, 6]
CList = [7, 8, 9]
List_of_Lists = []

List_of_Lists.append(AList)
List_of_Lists.append(BList)
List_of_Lists.append(CList)

print(List_of_Lists)

哪会这样：

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

还有其他方法可以创建列表，而不是从一开始就将它们全部初始化：

ListCreator = int(input('Input how many lists are needed: '))
ListofLists = [[] for index in range(ListCreator)]

还有更多方法可以解决这个问题，但我不知道你打算如何实施它。

Answer 3

你可以简单地做

import numpy as np

def get_y_hat(y_bar, x_train, theta_Ridge_Matrix):
     print theta_Ridge_Matrix.shape
     print theta_Ridge_Matrix.shape[0]
     yH = np.empty(theta_Ridge_Matrix.shape[0], theta_Ridge_Matrix[0].shape[0])
     for i in range(theta_Ridge_Matrix.shape[0]):
        yH[i, :] = np.dot(x_train, theta_Ridge_Matrix[i].T)
     print yH

如果您将theta_Ridge_Matrix存储在3D数组中，您还可以np.dot使用yH = np.dot(x_train, theta_Ridge_Matrix)来完成工作，这将总结矩阵的倒数第二个维度。

Answer 4

我建议使用numpy，因为你需要安装它

在此网站的窗口上：

http://sourceforge.net/projects/numpy/files/NumPy/

如何使用它的一些例子。

import numpy as np

我们将创建一个数组，我们将其命名为

>>> mat = np.random.randn(2,3)
>>> mat
array([[ 1.02063865, 1.52885147, 0.45588211],
       [-0.82198131, 0.20995583, 0.31997462]])

使用动词'T'转换数组

>>> mat.T
array([[ 1.02063865, -0.82198131],
       [ 1.52885147, 0.20995583],
       [ 0.45588211, 0.31997462]])

使用\ verb“reshape”方法

更改任何数组的形状

>>> mat = np.random.randn(3,6)
array([[ 2.01139326, 1.33267072, 1.2947112 , 0.07492725, 0.49765694,
         0.01757505],
       [ 0.42309629, 0.95921276, 0.55840131, -1.22253606, -0.91811118,
         0.59646987],
       [ 0.19714104, -1.59446001, 1.43990671, -0.98266887, -0.42292461,
        -1.2378431 ]])
>>> mat.reshape(2,9)
array([[ 2.01139326, 1.33267072, 1.2947112 , 0.07492725, 0.49765694,
         0.01757505, 0.42309629, 0.95921276, 0.55840131],
       [-1.22253606, -0.91811118, 0.59646987, 0.19714104, -1.59446001,
         1.43990671, -0.98266887, -0.42292461, -1.2378431 ]])

我们可以使用\ verb“shape”属性更改变量的形状。

>>> mat = np.random.randn(4,3)
>>> mat.shape
(4, 3)
>>> mat
array([[-1.47446507, -0.46316836, 0.44047531],
       [-0.21275495, -1.16089705, -1.14349478],
       [-0.83299338, 0.20336677, 0.13460515],
       [-1.73323076, -0.66500491, 1.13514327]])
>>> mat.shape = 2,6
>>> mat.shape
(2, 6)

>>> mat
array([[-1.47446507, -0.46316836, 0.44047531, -0.21275495, -1.16089705,
        -1.14349478],
       [-0.83299338, 0.20336677, 0.13460515, -1.73323076, -0.66500491,
         1.13514327]])

存储这些向量但在Python中使用哪种数据结构

4 个答案: