Question

我有一个由对象列表组成的pandas DataFrame，然后是每个对象的12个值的4个列表。它具有一般形式：

我想转置数据框并具有分层索引（'名称'，'4个列表的名称'）。这种一般形式看起来像

我尝试过以下操作，其中rows_list是我的源数据：

import pandas as pd

test_table = pd.DataFrame(rows_list, columns=("name", "frac_0", "frac_1","frac_2", "frac_3"))

name = pd.Series(test_table['name'])

del test_table['name']
test_table = test_table.T
test_table = test_table.sort_index([subjectname])

这给了我一个说明

的TypeError

“unhashable type：'list'”。

简单的test_table.T操作也不能满足我的需要，因为我需要列对应（ List1，List2，等）列表中的项目，以及行按名称索引，然后 List1，List2 。我已经来回添加新的专栏，或尝试从多个系列中构建一个全新的DataFrame，但似乎没有任何效果。

感谢您的帮助！

Answer 1

Mock df：

df = pd.DataFrame(columns=['Name', 'List 1', 'List 2'], data=[['A', [1,2,3,4], [1,2,3,4]], ['B', [1,2,3,4], [1,2,3,4]], ['C', [1,2,3,4], [1,2,3,4]]])

取消'姓名'：

df.set_index('Name', inplace=True)

                List 1        List 2
    Name                            
    A     [1, 2, 3, 4]  [1, 2, 3, 4]
    B     [1, 2, 3, 4]  [1, 2, 3, 4]
    C     [1, 2, 3, 4]  [1, 2, 3, 4]

n_name = len(df.index)
n_list = len(df.columns)
n_item = len(df.iat[0, 0])

df值现在的形状为（3,2）。我们需要在这个模拟df中重塑一个（6，）数组来删除一个维度。然后我们将其列为清单。

vals = list(df.values.reshape((n_list * n_name),))

[[1, 2, 3, 4],
 [1, 2, 3, 4],
 [1, 2, 3, 4],
 [1, 2, 3, 4],
 [1, 2, 3, 4],
 [1, 2, 3, 4]]

现在我们获取索引级别的值。由于'Name'是第一级，我们希望该级别重复下一级别中唯一值的数量，因此我们使用repeat。列表级别，我们要维护订单，所以我们使用tile。然后添加您的列名称：

idx_name = np.repeat(df.index.values, n_list)
idx_list = np.tile(df.columns.values, n_name)
columns = ['Col' + str(n) for n in list(range(1, n_item+1))]

创建最终df：

df = pd.DataFrame(data=vals, index=[idx_name, idx_list], columns=columns)

          Col1  Col2  Col3  Col4
A List 1     1     2     3     4
  List 2     1     2     3     4
B List 1     1     2     3     4
  List 2     1     2     3     4
C List 1     1     2     3     4
  List 2     1     2     3     4

代码：

df = pd.DataFrame(columns=['Name', 'List 1', 'List 2'], data=[['A', [1,2,3,4], [1,2,3,4]], ['B', [1,2,3,4], [1,2,3,4]], ['C', [1,2,3,4], [1,2,3,4]]])

df.set_index('Name', inplace=True)

n_name = len(df.index)
n_list = len(df.columns)
n_item = len(df.iat[0, 0])

vals = list(df.values.reshape((n_list * n_name),))
idx_name = np.repeat(df.index.values, n_list)
idx_list = np.tile(df.columns.values, n_name)
columns = ['Col' + str(n) for n in list(range(1, n_item+1))]

df = pd.DataFrame(data=vals, index=[idx_name, idx_list], columns=columns)

在Pandas / Python

1 个答案: