从以下
开始df = pd.DataFrame( {'Item':['A','A','A','B','B','C','C','C','C'],
'Name': ['Tom','John','Paul','Tom','Frank','Tom', 'John', 'Richard', 'James'],
'Total':[3,3,3,2,2,4,4,4,4]})
print df
Item Name
0 A Tom
1 A John
2 A Paul
3 B Tom
4 B Frank
5 C Tom
6 C John
7 C Richard
8 C James
#merge M:N by column Item
df1 = pd.merge(df, df, on=['Item'])
#remove duplicity - column Name_x == Name_y
df1 = df1[~(df1['Name_x'] == df1['Name_y'])]
#print df1
#create lists
df1 = df1.groupby('Name_x')['Name_y'].apply(lambda x: x.tolist()).reset_index()
print df1
Name_x Name_y
0 Frank [Tom]
1 James [Tom, John, Richard]
2 John [Tom, Paul, Tom, Richard, James]
3 Paul [Tom, John]
4 Richard [Tom, John, James]
5 Tom [John, Paul, Frank, John, Richard, James]
我有一个数据框如下:
print df
Name People times
0 Frank [Tom] [1]
1 James [John, Richard, Tom] [1, 1, 1]
2 John [James, Paul, Richard, Tom] [1, 1, 1, 2]
3 Paul [John, Tom] [1, 1]
4 Richard [James, John, Tom] [1, 1, 1]
5 Tom [Frank, James, John, Paul, Richard] [1, 1, 2, 1, 1]
我想为每个Name
创建一个堆积条形图,将People
视为条形,将times
视为值。
我想做这样的事情
sub_df = df.groupby(['Name','People'])['Times'].sum().unstack()
sub_df.plot(kind='bar',stacked=True)
但它返回
TypeError:不可用类型:' numpy.ndarray'
答案 0 :(得分:0)
你必须使用申请灵活类型的' agg'在groupby
之后:
df1['People'] = df1['Name_y'].apply(lambda x: tuple(x))
df1['Times'] = df1['Name_y'].apply(lambda x: [x.count(name) for name in list(set(x))])
s = df1.groupby(['Name_x','People']).apply(lambda x: sum(x.iloc[0]['Times']))
然后你得到以下
Name_x People
Frank (Tom,) 1
James (Tom, John, Richard) 3
John (Tom, Paul, Tom, Richard, James) 5
Paul (Tom, John) 2
Richard (Tom, John, James) 3
Tom (John, Paul, Frank, John, Richard, James) 6
dtype: int64
你可以随意绘图
s.plot(kind='bar', stacked=True)