在一组这样的服务器上,我有一个售票的熊猫数据帧:
a b c Users Problem
0 data data data User A Server Down
1 data data data User B Server Down
2 date data data User C Memory Full
3 date data data User C Swap Full
4 date data data User D Unclassified
5 date data data User E Unclassified
6 data data data User B RAM Failure
我需要创建另一个像这样的数据框,其数据由票证类型和仅由两个用户A和B分别提出的票证计数分组,并在一个列中包含其他用户。
预期的新数据框:
+---------------+--------+--------+-------------+
| Type Of Error | User A | User B | Other Users |
+---------------+--------+--------+-------------+
| Server Down | 50 | 60 | 150 |
+---------------+--------+--------+-------------+
| Memory Full | 40 | 50 | 20 |
+---------------+--------+--------+-------------+
| Swap Full | 10 | 20 | 15 |
+---------------+--------+--------+-------------+
| Unclassified | 10 | 20 | 50 |
+---------------+--------+--------+-------------+
| | | | |
+---------------+--------+--------+-------------+
我尝试了.value_counts()
,它提供了该类型的总数。但是,我需要基于用户。
答案 0 :(得分:1)
如果没有User A
或User B
,则通过Series.where
将用户更改为Other Users
,然后使用crosstab
:
df['Users'] = df['Users'].where(df['Users'].isin(['User A','User B']), 'Other Users')
df = pd.crosstab(df['Problem'], df['Users'])[['User A','User B','Other Users']]
print (df)
Users User A User B Other Users
Problem
Memory Full 0 0 1
RAM Failure 0 1 0
Server Down 1 1 0
Swap Full 0 0 1
Unclassified 0 0 2
答案 1 :(得分:0)
您可以使用pivot_table
,它非常适合使用聚合函数:
users = df.Users.copy()
users[~users.isin(['User A', 'User B'])] = 'Other Users'
df.pivot_table(index='Problem', columns=users, aggfunc='count', values='a',
fill_value=0).reindex(['User A', 'User B', 'Other Users'], axis=1)
它给出:
Users User A User B Other Users
Problem
Memory Full 0 0 1
RAM Failure 0 1 0
Server Down 1 1 0
Swap Full 0 0 1
Unclassified 0 0 2