我有DataFrame(只是一个例子)
D = pd.DataFrame({i: {"name": str(i),
"vector": np.arange(i + i % 4, i + i % 4 + 10),
"sq": i ** 2,
"gp": i % 2} for i in range(10)}).T
gp name sq vector
0 0 0 0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 1 1 1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 0 2 4 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 1 3 9 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
4 0 4 16 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 1 5 25 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
6 0 6 36 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
7 1 7 49 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
8 0 8 64 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
9 1 9 81 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
我希望按列向量分组,然后按列gp分组。我怎么能这样做?
from dfply import *
D >>\
groupby(X.vector, X.gp) >>\
summarize(b=X.sq.sum())
结果
TypeError:不可用类型:' numpy.ndarray'
答案 0 :(得分:5)
我认为您需要在[{1}}中首先将列vector
转换为元组:
pandas
另一个解决方案是首先转换列:
print(D['sq'].groupby([D['vector'].apply(tuple), D['gp']]).sum().reset_index())
vector gp sq
0 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) 0 0
1 (2, 3, 4, 5, 6, 7, 8, 9, 10, 11) 1 1
2 (4, 5, 6, 7, 8, 9, 10, 11, 12, 13) 0 20
3 (6, 7, 8, 9, 10, 11, 12, 13, 14, 15) 1 34
4 (8, 9, 10, 11, 12, 13, 14, 15, 16, 17) 0 100
5 (10, 11, 12, 13, 14, 15, 16, 17, 18, 19) 1 130
如果必要的话,最后一次转换为D['vector'] = D['vector'].apply(tuple)
print(D.groupby(['vector','gp'])['sq'].sum().reset_index())
vector gp sq
0 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) 0 0
1 (2, 3, 4, 5, 6, 7, 8, 9, 10, 11) 1 1
2 (4, 5, 6, 7, 8, 9, 10, 11, 12, 13) 0 20
3 (6, 7, 8, 9, 10, 11, 12, 13, 14, 15) 1 34
4 (8, 9, 10, 11, 12, 13, 14, 15, 16, 17) 0 100
5 (10, 11, 12, 13, 14, 15, 16, 17, 18, 19) 1 130
:
array
我尝试使用您的代码并为我工作:
D['vector'] = D['vector'].apply(tuple)
df = D.groupby(['vector','gp'])['sq'].sum().reset_index()
df['vector'] = df['vector'].apply(np.array)
print (df)
vector gp sq
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 0 0
1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 1 1
2 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] 0 20
3 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15] 1 34
4 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17] 0 100
5 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 1 130
print (type(df['vector'].iat[0]))
<class 'numpy.ndarray'>
答案 1 :(得分:4)
list
s 不 hashable ... tuple
s。我们希望按vector
列的tuplified版本进行分组。我将使用列表理解。
D.groupby([[tuple(x) for x in D.vector], 'gp']).sq.sum()
gp
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) 0 0
(2, 3, 4, 5, 6, 7, 8, 9, 10, 11) 1 1
(4, 5, 6, 7, 8, 9, 10, 11, 12, 13) 0 20
(6, 7, 8, 9, 10, 11, 12, 13, 14, 15) 1 34
(8, 9, 10, 11, 12, 13, 14, 15, 16, 17) 0 100
(10, 11, 12, 13, 14, 15, 16, 17, 18, 19) 1 130
Name: sq, dtype: int64
将其恢复为原始形式......多种方式之一
d1 = D.groupby([[tuple(x) for x in D.vector], 'gp']).sq.sum()
d1.reset_index('gp').rename(index=list).rename_axis('vector').reset_index()
vector gp sq
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 0 0
1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 1 1
2 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] 0 20
3 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15] 1 34
4 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17] 0 100
5 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 1 130
答案 2 :(得分:0)
D.groupby([D.vector.apply(str), D.gp]).sq.sum().reset_index()