如何基于另一个数据框中的一列的value_counts创建一个新的数据框,但在其他列上具有某些条件?

时间:2019-07-12 07:18:04

标签: python pandas dataframe pandas-groupby

在一组这样的服务器上,我有一个售票的熊猫数据帧:

      a     b     c   Users       Problem
0  data  data  data  User A   Server Down
1  data  data  data  User B   Server Down
2  date  data  data  User C   Memory Full
3  date  data  data  User C     Swap Full
4  date  data  data  User D  Unclassified
5  date  data  data  User E  Unclassified
6  data  data  data  User B   RAM Failure

我需要创建另一个像这样的数据框,其数据由票证类型和仅由两个用户A和B分别提出的票证计数分组,并在一个列中包含其他用户。

预期的新数据框:

+---------------+--------+--------+-------------+
| Type Of Error | User A | User B | Other Users |
+---------------+--------+--------+-------------+
| Server Down   | 50     | 60     | 150         |
+---------------+--------+--------+-------------+
| Memory Full   | 40     | 50     | 20          |
+---------------+--------+--------+-------------+
| Swap Full     | 10     | 20     | 15          |
+---------------+--------+--------+-------------+
| Unclassified  | 10     | 20     | 50          |
+---------------+--------+--------+-------------+
|               |        |        |             |
+---------------+--------+--------+-------------+

我尝试了.value_counts(),它提供了该类型的总数。但是,我需要基于用户。

2 个答案:

答案 0 :(得分:1)

如果没有User AUser B,则通过Series.where将用户更改为Other Users,然后使用crosstab

df['Users'] = df['Users'].where(df['Users'].isin(['User A','User B']), 'Other Users')

df = pd.crosstab(df['Problem'], df['Users'])[['User A','User B','Other Users']]
print (df)
Users         User A  User B  Other Users
Problem                                  
Memory Full        0       0            1
RAM Failure        0       1            0
Server Down        1       1            0
Swap Full          0       0            1
Unclassified       0       0            2

答案 1 :(得分:0)

您可以使用pivot_table,它非常适合使用聚合函数:

users = df.Users.copy()
users[~users.isin(['User A', 'User B'])] = 'Other Users'
df.pivot_table(index='Problem', columns=users, aggfunc='count', values='a',
               fill_value=0).reindex(['User A', 'User B', 'Other Users'], axis=1)

它给出:

Users         User A  User B  Other Users
Problem                                  
Memory Full        0       0            1
RAM Failure        0       1            0
Server Down        1       1            0
Swap Full          0       0            1
Unclassified       0       0            2