Question

我正在从一个关于SaaS小部件用户的api中传输一些数据，并希望根据用户活动＆＃39;进行一些分析。在过程中找到效率。我希望回答诸如“用户行为的哪些组（小组）导致成功完成”之类的问题＆＃39;等

目前，数据是时间戳记的回复日志，包括有关特定用户的分类功能，以及针对特定互动时段的具体操作和响应：

Timestamp    User   Cat1   Cat2     Action     Response
timenow      User1  False  barbar   action1    response4
time(n-1)    User2  False  No value action1    response3
time(n-2)    User1  False  barbar   baraction  response2
time(n-3)    User3  True   bar      action1    response1
time(n-4)    User2  False  foo      action1    response2
time(n-5)    User1  False  barbar   fooaction  response1

我想按用户对数据进行分组，然后列出所有带有计数的操作：

User    Cat1   Cat2     Action1   Action2     Response1  Response 2
User3   True   bar      2           1            7          1 
User2   False  foo      4           5            8          4  
User1   False  barbar   5           2            3          0

我可以想象用熊猫做这个，用循环创建一个新的数据框，格式为I＆＃39;之后。但是，我想知道在大熊猫中是否有任何巧妙的方法，或者是否有更好的格式（groupbys？）可能会产生类似的结果？

Answer 1

我不完全了解你的输出。时间戳列在哪里？您如何选择Cat1和Cat2值？

至于其余部分，您可以使用get_dummies和groupby：

创建输入数据框：

import io
temp = u"""Timestamp    User   Cat1   Cat2     Action     Response
timenow      User1  False  barbar   action1    response4
time(n-1)    User2  False  Novalue action1    response3
time(n-2)    User1  False  barbar   baraction  response2
time(n-3)    User3  True   bar      action1    response1
time(n-4)    User2  False  foo      action1    response2
time(n-5)    User1  False  barbar   fooaction  response1"""
df = pd.read_csv(io.StringIO(temp),delim_whitespace = True)

输出：

    Timestamp   User    Cat1    Cat2    Action      Response
0   timenow     User1   False   barbar  action1     response4
1   time(n-1)   User2   False   Novalue action1     response3
2   time(n-2)   User1   False   barbar  baraction   response2
3   time(n-3)   User3   True    bar     action1     response1
4   time(n-4)   User2   False   foo     action1     response2
5   time(n-5)   User1   False   barbar  fooaction   response1

使用get_dummies，您可以获得所需的列：

df = df[['User','Action','Response']]
df = pd.concat([df,df['Action'].str.get_dummies(),df['Response'].str.get_dummies()],axis = 1)
df.drop(['Action','Response'],1,inplace = True)


    User    action1 baraction   fooaction   response1   response2   response3   response4
0   User1   1       0           0           0           0           0           1
1   User2   1       0           0           0           0           1           0
2   User1   0       1           0           0           1           0           0
3   User3   1       0           0           1           0           0           0
4   User2   1       0           0           0           1           0           0
5   User1   0       0           1           1           0           0           0

最后你使用groupby：

df.groupby('User',as_index = False).sum()

    User    action1 baraction   fooaction   response1   response2   response3   response4
0   User1   1       1           1           1           1           0           1
1   User2   2       0           0           0           1           1           0
2   User3   1       0           0           1           0           0           0

将流数据转换为要素表 - 分析流数据

1 个答案: