我有一个针对所有问题的7点量表的调查数据集,我希望在所有列中获得常用值的value_counts(并且还将数据帧分组为两列)。让我向您展示一个示例数据集,以及我到目前为止所处的位置。
| col1 | col2 | col3 | Building | Levels_Name |
|---------------|---------------|---------------|---------------|------------------------|
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor |
| Satisfied | Satisfied | NA | Basingstoke | Individual Contributor |
| Not Satisfied | Satisfied | Not Satisfied | San Francisco | Middle Management |
| Not Satisfied | Satisfied | Not Satisfied | Miami | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Senior Leadership |
| NA | NA | NA | Foster City | Other |
| Not Satisfied | Not Satisfied | NA | Foster City | Senior Leadership |
| Not Satisfied | Satisfied | Not Satisfied | Austin | Middle Management |
| Satisfied | Satisfied | Satisfied | San Francisco | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Individual Contributor |
| Satisfied | Satisfied | NA | Miami | Middle Management |
现在,我想将这些数据分组为' Building'和' Levels_Name'并添加一个新的分组,以满足'满意'不满意'' NA'并获取每列的值计数。
所以结果应如下所示:
| Building | Levels_Name | Sentiment | col1 | col2 | col3 |
|---------------|------------------------|---------------|------|------|------|
| Foster City | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| Foster City | Individual Contributor | NA | 0 | 0 | 0 |
| Foster City | Individual Contributor | Satisfied | 0 | 0 | 0 |
| Foster City | Senior Leadership | Not Satisfied | 2 | 2 | 0 |
| Foster City | Senior Leadership | NA | 0 | 0 | 1 |
| Foster City | Senior Leadership | Satisfied | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| San Francisco | Individual Contributor | NA | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Satisfied | 0 | 0 | 0 |
谢谢!
答案 0 :(得分:1)
首先,您要融合数据框,然后按
进行分组d1 = pd.melt(
df, ['Building', 'Levels_Name'], value_name='Sentiment'
).replace(np.nan, 'NaN')
d1.groupby(
d1.columns.tolist()
).size().unstack('variable', fill_value=0).reset_index()
variable Building Levels_Name Sentiment col1 col2 col3
0 Austin Middle Management Not Satisfied 1 0 1
1 Austin Middle Management Satisfied 0 1 0
2 Basingstoke Individual Contributor NaN 0 0 1
3 Basingstoke Individual Contributor Satisfied 1 1 0
4 Foster City Individual Contributor Not Satisfied 1 1 1
5 Foster City Other NaN 1 1 1
6 Foster City Senior Leadership NaN 0 0 1
7 Foster City Senior Leadership Not Satisfied 2 2 1
8 Miami Middle Management NaN 0 0 1
9 Miami Middle Management Satisfied 1 1 0
10 Miami Senior Leadership Not Satisfied 1 0 1
11 Miami Senior Leadership Satisfied 0 1 0
12 San Francisco Individual Contributor Not Satisfied 1 1 1
13 San Francisco Middle Management Not Satisfied 1 0 1
14 San Francisco Middle Management Satisfied 0 1 0
15 San Francisco Senior Leadership Satisfied 1 1 1