Python - Pandas - 分组数据框中所有列的value_counts

时间:2017-05-07 07:52:47

标签: python pandas

我有一个针对所有问题的7点量表的调查数据集,我希望在所有列中获得常用值的value_counts(并且还将数据帧分组为两列)。让我向您展示一个示例数据集,以及我到目前为止所处的位置。

| col1          | col2          | col3          | Building      | Levels_Name            |
|---------------|---------------|---------------|---------------|------------------------|
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor |
| Satisfied     | Satisfied     | NA            | Basingstoke   | Individual Contributor |
| Not Satisfied | Satisfied     | Not Satisfied | San Francisco | Middle Management      |
| Not Satisfied | Satisfied     | Not Satisfied | Miami         | Senior Leadership      |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City   | Senior Leadership      |
| NA            | NA            | NA            | Foster City   | Other                  |
| Not Satisfied | Not Satisfied | NA            | Foster City   | Senior Leadership      |
| Not Satisfied | Satisfied     | Not Satisfied | Austin        | Middle Management      |
| Satisfied     | Satisfied     | Satisfied     | San Francisco | Senior Leadership      |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City   | Individual Contributor |
| Satisfied     | Satisfied     | NA            | Miami         | Middle Management      |

现在,我想将这些数据分组为' Building'和' Levels_Name'并添加一个新的分组,以满足'满意'不满意'' NA'并获取每列的值计数。

所以结果应如下所示:

| Building      | Levels_Name            | Sentiment     | col1 | col2 | col3 |
|---------------|------------------------|---------------|------|------|------|
| Foster City   | Individual Contributor | Not Satisfied | 1    | 1    | 1    |
| Foster City   | Individual Contributor | NA            | 0    | 0    | 0    |
| Foster City   | Individual Contributor | Satisfied     | 0    | 0    | 0    |
| Foster City   | Senior Leadership      | Not Satisfied | 2    | 2    | 0    |
| Foster City   | Senior Leadership      | NA            | 0    | 0    | 1    |
| Foster City   | Senior Leadership      | Satisfied     | 0    | 0    | 0    |
| San Francisco | Individual Contributor | Not Satisfied | 1    | 1    | 1    |
| San Francisco | Individual Contributor | NA            | 0    | 0    | 0    |
| San Francisco | Individual Contributor | Satisfied     | 0    | 0    | 0    |

谢谢!

1 个答案:

答案 0 :(得分:1)

首先,您要融合数据框,然后按

进行分组
d1 = pd.melt(
    df, ['Building', 'Levels_Name'], value_name='Sentiment'
).replace(np.nan, 'NaN')

d1.groupby(
    d1.columns.tolist()
).size().unstack('variable', fill_value=0).reset_index()

variable       Building             Levels_Name      Sentiment  col1  col2  col3
0                Austin       Middle Management  Not Satisfied     1     0     1
1                Austin       Middle Management      Satisfied     0     1     0
2           Basingstoke  Individual Contributor            NaN     0     0     1
3           Basingstoke  Individual Contributor      Satisfied     1     1     0
4           Foster City  Individual Contributor  Not Satisfied     1     1     1
5           Foster City                   Other            NaN     1     1     1
6           Foster City       Senior Leadership            NaN     0     0     1
7           Foster City       Senior Leadership  Not Satisfied     2     2     1
8                 Miami       Middle Management            NaN     0     0     1
9                 Miami       Middle Management      Satisfied     1     1     0
10                Miami       Senior Leadership  Not Satisfied     1     0     1
11                Miami       Senior Leadership      Satisfied     0     1     0
12        San Francisco  Individual Contributor  Not Satisfied     1     1     1
13        San Francisco       Middle Management  Not Satisfied     1     0     1
14        San Francisco       Middle Management      Satisfied     0     1     0
15        San Francisco       Senior Leadership      Satisfied     1     1     1