列中的每个数据点都有字典列表。如何将这些条目转换为列?

时间:2018-11-16 22:29:05

标签: python mongodb pandas dataframe pymongo

假设我有一个像这样的数据框:

Name    Classes

Bill    [{'class': CS152, 'time': 2:00 PM}, {'class': PHYS162, 'time': 3:30 PM}]
Adam    [{'class': EE193, 'time': 1:00 PM}, {'class': PHYS162, 'time': 2:30 PM}]
Sara    [{'class': CS152, 'time': 4:00 PM}, {'class': BIO182, 'time': 6:30 PM}]

如何使数据框看起来像这样:

Name    CS152     PHYS162    EE193      BIO182

Bill    2:00 PM   3:30 PM    NaN        NaN
Adam    NaN       2:30 PM    1:00 PM    NaN
Sara    4:00 PM   NaN        NaN        6:30 PM

2 个答案:

答案 0 :(得分:0)

可能其中一种可能更优雅,但这是一种可能性:

def to_frame(key, classes):
    """expand list of dicts into DataFrame"""
    data = [d for row in classes for d in row]
    return pd.DataFrame(data, index=[key] * len(data))


res = (
    # expand nested data structures
    pd.concat([
        to_frame(key, classes) for key, classes in data.groupby('name')['classes']
    ])
    .reset_index()
    .rename(columns={'index': 'name'})
    # pivot table
    .pivot_table(index='name', columns='class', values='time', aggfunc='first')
    .reset_index()
)
res.columns.name = None
print(res)

       name   BIO182    CS152    EE193  PHYS162
0      Adam      NaN      NaN  1:00 PM  2:30 PM
1      Bill      NaN  2:00 PM      NaN  3:30 PM
2      Sara  6:30 PM  4:00 PM      NaN      NaN

答案 1 :(得分:0)

一种执行此操作的方法...但是可以对其进行优化

so = pd.DataFrame([['Bill',[{'class': 'CS152', 'time': '2:00 PM'}, {'class': 'PHYS162', 'time': '3:30 PM'}]],
                   ['Adam',[{'class': 'EE193', 'time': '1:00 PM'}, {'class': 'PHYS162', 'time': '2:30 PM'}]],
                   ['Sara',[{'class': 'CS152', 'time': '4:00 PM'}, {'class': 'BIO182', 'time': '6:30 PM'}]]
                  ],columns=('Name','Classes'))

for id in so.index:
    name = so.loc[id,'Name']
    classes = so.loc[id,'Classes']
    #create series data for individual person
    seriesdata = pd.Series([])

    for rowclass in classes:
        classname = rowclass['class']
        classtime = rowclass['time']
        seriesdata[classname]=classtime
    print(seriesdata)
    #Creating a dictionary of name:series data
    newdict[name]=seriesdata


df = pd.DataFrame(newdict)
print(df.T)