Python Pandas:计算多个列的每个唯一值的显示次数

时间:2013-07-19 23:25:54

标签: pandas

假设我有一个DataFrame,例如,

In [7]: source = pd.DataFrame([['amazon.com', 'correct', 'correct'], ['amazon.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'correct'], ['walmart.com', 'incorrect', 'incorrect']], columns=['domain', 'price', 'product'])

In [8]: source
Out[8]:
        domain      price    product
0   amazon.com    correct    correct
1   amazon.com  incorrect    correct
2  walmart.com  incorrect    correct
3  walmart.com  incorrect  incorrect

我想为每个domain计算price == 'correct'price == 'incorrect'的次数,以及product的相同次数。换句话说,我希望看到像这样的输出,

        domain      key      value  count
0   amazon.com    price    correct      1
1   amazon.com    price  incorrect      1
2   amazon.com  product    correct      2
3  walmart.com    price  incorrect      2
4  walmart.com  product    correct      1
5  walmart.com  product  incorrect      1

我该怎么做?

2 个答案:

答案 0 :(得分:7)

嵌套的应用程序将执行此操作

In [24]: source.groupby('domain').apply(lambda x: 
                          x[['price','product']].apply(lambda y: y.value_counts())).fillna(0)

Out[24]: 
                       price  product
domain                               
amazon.com  correct        1        2
            incorrect      1        0
walmart.com correct        0        1
            incorrect      2        1

答案 1 :(得分:0)

In [17]: %paste
    (
      pd.melt(source, id_vars=['domain'], value_vars=['price', 'product'])
      .groupby(['domain', 'variable', 'value'])
      .size()
      .reset_index()
      .rename(columns={'variable': 'key', 0: 'count'})
    )

## -- End pasted text --
Out[17]:
        domain      key      value  count
0   amazon.com    price    correct      1
1   amazon.com    price  incorrect      1
2   amazon.com  product    correct      2
3  walmart.com    price  incorrect      2
4  walmart.com  product    correct      1
5  walmart.com  product  incorrect      1