Question

我的数据集结构如下：

mydic = {'2017-9-11': {'Type1': [15, 115452.0, 3], 'Type2': [47, 176153.0, 4], 'Type3': [0, 0, 0]}, '2017-9-12': {'Type1': [26, 198223.0, 5], 'Type2': [39, 178610.0, 6], 'Type3': [0, 0, 0]}}
df = pd.DataFrame.from_dict(mydic, orient='index')

我需要将列表中的值拆分为不同的列，并按类型对它们进行分组。这就是我的工作：

df_new = df[list(df)].unstack().apply(pd.Series)
df_new.head()

它有效：

                    0       1           2
Type1   2017-9-11   15.0    115452.0    3.0
        2017-9-12   26.0    198223.0    5.0
Type3   2017-9-11   0.0     0.0         0.0
        2017-9-12   0.0     0.0         0.0
Type2   2017-9-11   47.0    176153.0    4.0

但是，当我将此代码应用于更大的真实数据集时，似乎apply(pd.Series)不起作用，我只得到一列0，其中包含如下值的列表：

                    0    
Type1  2017-9-11    [15, 115452.0, 3]
       2017-9-12    [26, 198223.0, 5]
Type2  2017-9-11    [47, 176153.0, 4]
       2017-9-12    [39, 178610.0, 6]
Type3  2017-9-11            [0, 0, 0]

有人可以建议可能出错吗？或者建议另一种解决方案？

Answer 1

它认为更快的解决方案是DataFrame构造函数，请参阅timings：

s = df.unstack()
df = pd.DataFrame(s.values.tolist(), index=s.index)
print (df)
                  0         1  2
Type1 2017-9-11  15  115452.0  3
      2017-9-12  26  198223.0  5
Type2 2017-9-11  47  176153.0  4
      2017-9-12  39  178610.0  6
Type3 2017-9-11   0       0.0  0
      2017-9-12   0       0.0  0

编辑：

如果值是字符串：

df = df.unstack().str.strip('[]').str.split(', ', expand=True).astype(float)
print (df)
                    0         1    2
Type1 2017-9-11  15.0  115452.0  3.0
      2017-9-12  26.0  198223.0  5.0
Type2 2017-9-11  47.0  176153.0  4.0
      2017-9-12  39.0  178610.0  6.0
Type3 2017-9-11   0.0       0.0  0.0
      2017-9-12   0.0       0.0  0.0

或者可以将值转换为list s：

import ast

s = df.unstack().apply(ast.literal_eval)
df = pd.DataFrame(s.values.tolist(), index=s.index).astype(float)
print (df)
                    0         1    2
Type1 2017-9-11  15.0  115452.0  3.0
      2017-9-12  26.0  198223.0  5.0
Type2 2017-9-11  47.0  176153.0  4.0
      2017-9-12  39.0  178610.0  6.0
Type3 2017-9-11   0.0       0.0  0.0
      2017-9-12   0.0       0.0  0.0

Answer 2

对于数据框，指出要申请的女巫列。

display: flex;
justify-content: space-between;
align-items: center;

休息时间：

df.unstack().to_frame()[0].apply(pd.Series)

Out[545]: 
                    0         1    2
Type2 2017-9-11  47.0  176153.0  4.0
      2017-9-12  39.0  178610.0  6.0
Type1 2017-9-11  15.0  115452.0  3.0
      2017-9-12  26.0  198223.0  5.0
Type3 2017-9-11   0.0       0.0  0.0
      2017-9-12   0.0       0.0  0.0

然后执行df1=df.unstack().to_frame() df1 Out[546]: 0 Type2 2017-9-11 [47, 176153.0, 4] 2017-9-12 [39, 178610.0, 6] Type1 2017-9-11 [15, 115452.0, 3] 2017-9-12 [26, 198223.0, 5] Type3 2017-9-11 [0, 0, 0] 2017-9-12 [0, 0, 0]：

DF1 [0]。适用（pd.Series）

apply

使用pandas中的列表：apply（pd.Series）不起作用 - 另一种解决方案？

2 个答案: