假设我有一个像这样的数据框:
Name Classes
Bill [{'class': CS152, 'time': 2:00 PM}, {'class': PHYS162, 'time': 3:30 PM}]
Adam [{'class': EE193, 'time': 1:00 PM}, {'class': PHYS162, 'time': 2:30 PM}]
Sara [{'class': CS152, 'time': 4:00 PM}, {'class': BIO182, 'time': 6:30 PM}]
如何使数据框看起来像这样:
Name CS152 PHYS162 EE193 BIO182
Bill 2:00 PM 3:30 PM NaN NaN
Adam NaN 2:30 PM 1:00 PM NaN
Sara 4:00 PM NaN NaN 6:30 PM
答案 0 :(得分:0)
可能其中一种可能更优雅,但这是一种可能性:
def to_frame(key, classes):
"""expand list of dicts into DataFrame"""
data = [d for row in classes for d in row]
return pd.DataFrame(data, index=[key] * len(data))
res = (
# expand nested data structures
pd.concat([
to_frame(key, classes) for key, classes in data.groupby('name')['classes']
])
.reset_index()
.rename(columns={'index': 'name'})
# pivot table
.pivot_table(index='name', columns='class', values='time', aggfunc='first')
.reset_index()
)
res.columns.name = None
print(res)
name BIO182 CS152 EE193 PHYS162
0 Adam NaN NaN 1:00 PM 2:30 PM
1 Bill NaN 2:00 PM NaN 3:30 PM
2 Sara 6:30 PM 4:00 PM NaN NaN
答案 1 :(得分:0)
一种执行此操作的方法...但是可以对其进行优化
so = pd.DataFrame([['Bill',[{'class': 'CS152', 'time': '2:00 PM'}, {'class': 'PHYS162', 'time': '3:30 PM'}]],
['Adam',[{'class': 'EE193', 'time': '1:00 PM'}, {'class': 'PHYS162', 'time': '2:30 PM'}]],
['Sara',[{'class': 'CS152', 'time': '4:00 PM'}, {'class': 'BIO182', 'time': '6:30 PM'}]]
],columns=('Name','Classes'))
for id in so.index:
name = so.loc[id,'Name']
classes = so.loc[id,'Classes']
#create series data for individual person
seriesdata = pd.Series([])
for rowclass in classes:
classname = rowclass['class']
classtime = rowclass['time']
seriesdata[classname]=classtime
print(seriesdata)
#Creating a dictionary of name:series data
newdict[name]=seriesdata
df = pd.DataFrame(newdict)
print(df.T)