如何将多行折叠为一个并创建一系列列元素Python Pandas

时间:2020-08-25 18:51:25

标签: python pandas group-by jupyter-notebook series

我有一个数据框,如下所示:

                tags    categories            classification
          0    label    ['legislative', 
                         'law, govt and 
                         politics', 'exe...        None
          0   document  ['legislative', 
                         'law, govt and politics', 
                            'exe...                 NaN
          0     text    ['legislative', 'law, 
                          govt and politics', 
                          'exe...                   NaN
          0     paper   ['legislative', 'law, 
                          govt and 
                        politics', 'exe...          NaN
          0     poster  ['legislative', 'law, 
                        govt and politics', 'exe... NaN
        

我想创建一个新的数据框,在其中我可以将上面的数据框折叠为下面的一个,以便将“标签”和“分类”列的列元素转换为单行,并具有列表格式的单个项,例如

                tags     categories           classification
       0   ['label',     ['legislative',      ['None','NaN',
           'document',  'law, govt and          'NaN','NaN',
             'text',          politics', 'exe...    'NaN']                
         'paper',poster']

我该怎么做?如何使用堆栈或按功能分组以获取结果?预先感谢。

*这是df.to_dict()的结果

           {'tags': {0: ' letter',
            1: ' head',
            2: ' water',
            3: ' art',
            4: ' indoors',
            5: ' flyer',
            6: ' poster',
            ...},
            'categories': {0: "['legislative', 'law, govt and politics', 
            'executive branch', 'work', 'society', 'government']",
            1: "['unrest and war', 'society', 'religion and spirituality', 
            'buddhism']",
            2: '[]',
            3: '[]',
            4: "['unemployment', 'society', 'law, govt and politics', 
            'foreign policy', 'work', 'politics', 'armed forces']",
            5: '[]',
            6: "['sports', 'law, govt and politics', 'wrestling']",
            ...},
            'classfication': {0: nan,
            1: nan,
            2: nan,
            3: nan,
            4: nan,
            5: nan,
            6: nan,
            ...}}

1 个答案:

答案 0 :(得分:0)

我没有完全回答您的问题。但是你想要这样的东西吗?

df:

    trial_num   subject samples
0   1           1       [-1.74, -0.78, -0.11]
1   2           1       [0.86, 0.21, -0.01]
2   3           1       [2.04, 0.6, -0.79]
3   1           2       [0.52, 0.49, 1.56]
4   2           2       [0.07, 0.84, -1.1]
5   3           2       [0.43, -1.3, 1.99]

转换后的df:

     trial_num          subject             samples
0   [1, 2, 3, 1, 2, 3]  [1, 1, 1, 2, 2, 2]  [[-1.74, -0.78, -0.11], [0.86, 0.21, -0.0...trial_num   subject samples
0   [1, 2, 3, 1, 2, 3]  [1, 1, 1, 2, 2, 2]  [[-1.74, -0.78, -0.11], [0.86, 0.21, -0.0...

import pandas as pd
df = pd.DataFrame(
    {'trial_num': [1, 2, 3, 1, 2, 3],
     'subject': [1, 1, 1, 2, 2, 2],
     'samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)
df = df.astype(str).apply(', '.join).apply(lambda x: x.split(',')).to_frame().T