Question

我有一个包含大约6500列的大数据帧，其中一列是classlabel，其余的是0或1的布尔值，数据帧是稀疏的。

示例：

df = pd.DataFrame({
            'label' : ['a', 'b', 'c', 'b','a', 'c', 'b', 'a'],
            'x1' : np.random.choice(2, 8),
            'x2' : np.random.choice(2, 8),
            'x3' : np.random.choice(2, 8)})

我想要的是一份报告（最好是在熊猫中，以便我可以轻松地绘制它），它向我显示按标签分组的列的唯一元素的总和。

例如，这个数据框：

    x1  x2  x3  label
0   0   1   1   a
1   1   0   1   b
2   0   1   0   c
3   1   0   0   b
4   1   1   1   a
5   0   0   1   c
6   1   0   0   b
7   0   1   0   a

结果应该是这样的：

a: 3 (since it has x1, x2 and x3)
b: 2 (since it has x1, x3)
c: 2 (since it has x2, x3)

因此，它计算每个标签中存在哪些列。想象一下直方图，其中x轴是label，y轴是number of columns。

Answer 1

您可以尝试旋转：

public void redirigeVersPortail(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse, String uriPortail) throws IOException {
    if ((httpServletRequest != null) && (httpServletRequest.getRequestURI().contains(".js"))) {
        httpServletResponse.setContentType("application/javascript");
        httpServletResponse.getWriter().println("window.location = \"" + uriPortail + "\";");
    } else {
        httpServletResponse.sendRedirect(uriPortail);
    }
}

对于df：

import pandas as pd
import numpy as np

df = pd.DataFrame({
        'label' : ['a', 'b', 'c', 'b','a', 'c', 'b', 'a'],
        'x1' : np.random.choice(2, 8),
        'x2' : np.random.choice(2, 8),
        'x3' : np.random.choice(2, 8)})

pd.pivot_table(df, index='label').transpose().apply(np.count_nonzero)

结果是：

label   x1  x2  x3
0   a   0   0   0
1   b   0   1   0
2   c   1   0   1
3   b   0   1   0
4   a   1   1   1
5   c   1   0   1
6   b   0   1   0
7   a   1   1   1

Answer 2

label = df.groupby('label')
for key,val in label.count()['x1'].iteritems():
    strg = '%s:%s' %(key,val)
    for col,vl in label.sum().ix[key].iteritems():
        if vl!=0:
            strg += ' %s'%col
    print strg

来自列数的熊猫直方图

2 个答案: