假设我有一个DataFrame,例如,
In [7]: source = pd.DataFrame([['amazon.com', 'correct', 'correct'], ['amazon.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'incorrect']], columns=['domain', 'price', 'product'])
In [8]: source
Out[8]:
domain price product
0 amazon.com correct correct
1 amazon.com incorrect correct
2 walmart.com incorrect correct
3 walmart.com incorrect incorrect
我想为每个domain
计算price == 'correct'
和price == 'incorrect'
的次数,以及product
的相同次数。换句话说,我希望看到像这样的输出,
domain key value count
0 amazon.com price correct 1
1 amazon.com price incorrect 1
2 amazon.com product correct 2
3 walmart.com price incorrect 2
4 walmart.com product correct 1
5 walmart.com product incorrect 1
我该怎么做?
答案 0 :(得分:7)
嵌套的应用程序将执行此操作
In [24]: source.groupby('domain').apply(lambda x:
x[['price','product']].apply(lambda y: y.value_counts())).fillna(0)
Out[24]:
price product
domain
amazon.com correct 1 2
incorrect 1 0
walmart.com correct 0 1
incorrect 2 1
答案 1 :(得分:0)
In [17]: %paste
(
pd.melt(source, id_vars=['domain'], value_vars=['price', 'product'])
.groupby(['domain', 'variable', 'value'])
.size()
.reset_index()
.rename(columns={'variable': 'key', 0: 'count'})
)
## -- End pasted text --
Out[17]:
domain key value count
0 amazon.com price correct 1
1 amazon.com price incorrect 1
2 amazon.com product correct 2
3 walmart.com price incorrect 2
4 walmart.com product correct 1
5 walmart.com product incorrect 1