Question

我目前的代码如下所示 - 我导入一个MAT文件并尝试从其中的变量创建一个DataFrame：

mat = loadmat(file_path)  # load mat-file
Variables = mat.keys()    # identify variable names

df = pd.DataFrame         # Initialise DataFrame

for name in Variables:

    B = mat[name]
    s = pd.Series (B[:,1])

所以在循环中我可以创建一系列的每个变量（它们是带有两列的数组 - 所以我需要的值在第2列中）

我的问题是如何将系列附加到数据框？我查看了文档，但没有一个例子符合我的想法。

最诚挚的问候，

本

Answer 1

以下是如何创建一个DataFrame，其中每个系列都是一行。

对于单个系列（产生单行DataFrame）：

series = pd.Series([1,2], index=['a','b'])
df = pd.DataFrame([series])

对于具有相同指数的多个系列：

cols = ['a','b']
list_of_series = [pd.Series([1,2],index=cols), pd.Series([3,4],index=cols)]
df = pd.DataFrame(list_of_series, columns=cols)

对于可能具有不同指数的多个系列：

list_of_series = [pd.Series([1,2],index=['a','b']), pd.Series([3,4],index=['a','c'])]
df = pd.concat(list_of_series, axis=1).transpose()

要创建一个DataFrame，其中每个系列都是一列，请查看其他人的答案。或者，可以创建一个DataFrame，其中每个系列都是一行，如上所述，然后使用df.transpose()。但是，如果列具有不同的数据类型，则后一种方法效率低下。

Answer 2

无需初始化一个空的DataFrame（你甚至没有这样做，你需要pd.DataFrame()与parens一起）。

相反，要创建一个DataFrame，其中每个系列都是一列，

列出系列，series和
将它们与df = pd.concat(series, axis=1)

类似的东西：

series = [pd.Series(mat[name][:, 1]) for name in Variables]
df = pd.concat(series, axis=1)

Answer 3

我认为实现这一点的方法可能更快 1）使用dict理解来获得所需的dict（即，获取每个数组的第二列） 2）然后使用pd.DataFrame直接从dict创建一个实例，而不在每个col和concat上循环。

假设您的mat看起来像这样（您可以忽略此问题，因为从文件中加载了mat）：

In [135]: mat = {'a': np.random.randint(5, size=(4,2)),
   .....: 'b': np.random.randint(5, size=(4,2))}

In [136]: mat
Out[136]: 
{'a': array([[2, 0],
        [3, 4],
        [0, 1],
        [4, 2]]), 'b': array([[1, 0],
        [1, 1],
        [1, 0],
        [2, 1]])}

然后你可以这样做：

In [137]: df = pd.DataFrame ({name:mat[name][:,1] for name in mat})

In [138]: df
Out[138]: 
   a  b
0  0  0
1  4  1
2  1  0
3  2  1

[4 rows x 2 columns]

熊猫：从系列创建DataFrame

3 个答案: