拆分列包含pandas中不同行的列表

时间:2018-06-06 21:02:55

标签: python pandas dataframe

我在这样的pandas中有一个数据框:

id     info
1      [1,2]
2      [3]
3      []

我想把它分成不同的行:

id     info
1      1 
1      2 
2      3 
3      NaN

我该怎么做?

4 个答案:

答案 0 :(得分:1)

这是一种相当复杂的方式,它会丢弃空单元格:

import pandas as pd

df = pd.DataFrame({'id': [1,2,3],
                   'info': [[1,2], [3], [ ]]})

unstack_df = df.set_index(['id'])['info'].apply(pd.Series)\
                                         .stack()\
                                         .reset_index(level=1, drop=True)

unstack_df = unstack_df.reset_index()
unstack_df.columns = ['id', 'info']

unstack_df

>>
       id   info
    0   1   1.0
    1   1   2.0
    2   2   3.0

答案 1 :(得分:1)

以下使用np.repeatitertools.chain的方式。将空列表转换为{np.nan}是一种欺骗Pandas接受迭代作为值的技巧。这允许chain.from_iterable无错误地工作。

import numpy as np
from itertools import chain

df.loc[~df['info'].apply(bool), 'info'] = {np.nan}

res = pd.DataFrame({'id': np.repeat(df['id'], df['info'].map(len).values),
                    'info': list(chain.from_iterable(df['info']))})

print(res)

   id  info
0   1   1.0
0   1   2.0
1   2   3.0
2   3   NaN

答案 2 :(得分:1)

你可以尝试一下:

>>> import pandas as pd
>>> df = pd.DataFrame({'id': [1,2,3], 'info': [[1,2],[3],[]]})
>>> s = df.apply(lambda x: pd.Series(x['info']), axis=1).stack().reset_index(level=1, drop=True)
>>> s.name = 'info'
>>> df2 = df.drop('info', axis=1).join(s)
>>> df2['info'] = pd.Series(df2['info'], dtype=object)
>>> df2
   id info
0   1    1
0   1    2
1   2    3
2   3  NaN

类似的问题发布在here

答案 3 :(得分:0)

也尝试这些方法...

方法1

def split_dataframe_rows(df,column_selectors):
    # we need to keep track of the ordering of the columns
    def _split_list_to_rows(row,row_accumulator,column_selector):
        split_rows = {}
        max_split = 0
        for column_selector in column_selectors:
            split_row = row[column_selector]
            split_rows[column_selector] = split_row
            if len(split_row) > max_split:
                max_split = len(split_row)

        for i in range(max_split):
            new_row = row.to_dict()
            for column_selector in column_selectors:
                try:
                    new_row[column_selector] = split_rows[column_selector].pop(0)
                except IndexError:
                    new_row[column_selector] = ''
            row_accumulator.append(new_row)

    new_rows = []
    df.apply(_split_list_to_rows,axis=1,args = (new_rows,column_selectors))
    new_df = pd.DataFrame(new_rows, columns=df.columns)
    return new_df

方法2

def flatten_data(json = None):
    df = pd.DataFrame(json)
    list_cols = [col for col in df.columns if type(df.loc[0, col]) == list]
    for i in range(len(list_cols)):
        col = list_cols[i]
        meta_cols = [col for col in df.columns if type(df.loc[0, col]) != list] + list_cols[i+1:]
        json_data = df.to_dict('records')
        df = json_normalize(data=json_data, record_path=col, meta=meta_cols, record_prefix=col+str('_'), sep='_')
    return json_normalize(df.to_dict('records'))