根据其他列

时间:2017-05-29 17:13:01

标签: r

我有一个与访问网站相关的数据框。每天多次访问,具有不同的可能操作和操作描述

People | Date       | Time  | Action | Descr  | 
       |            |       |        |        | 
j      | 01/01/2010 | 10:13 | X      | A      | 
j      | 01/01/2010 | 10:15 | Y      | B      | 
j      | 02/01/2010 | 14:15 | Z      | C      | 
j      | 03/01/2010 | 11:45 | X      | D      | 
j      | 03/01/2010 | 13:56 | X      | E      | 
j      | 03/01/2010 | 18:43 | Z      | F      | 
j      | 03/01/2010 | 18:44 | X      | A      | 

将数据框缩小为平衡的每日面板数据后,我需要创建变量:

- 第一个变量(FirstX)的值必须等于当天第一个Action = X(如果可用)的描述(Descr),否则为零

- 第二个变量的值必须等于当天的第二个Action = X的描述,否则为零

-so on

一旦我将其转换为平衡的每日面板(我可以做),我需要得到一个如下所示的最终结果:

People | Date       |Accesses| First X|Second X| Third X| Fourth X |
       |            |        |        |        |        |          |
j      | 01/01/2010 |    2   |   A    |   0    |    0   |    0     |
j      | 02/01/2010 |    1   |   0    |   0    |    0   |    0     |
j      | 03/01/2010 |    4   |   D    |   E    |    A   |    0     |

2 个答案:

答案 0 :(得分:0)

您可以使用dplyr包执行此操作:

library(dplyr)    
df %>% 
group_by(People,Date) %>% 
summarise(Accesses = n(),
FirstX = ifelse(sum(Action=="X")>=1,Descr[Action=="X"][1],"0"),
SecondX = ifelse(sum(Action=="X")>=2,Descr[Action=="X"][2],"0"),
ThirdX = ifelse(sum(Action=="X")>=3,Descr[Action=="X"][3],"0"),
FourthX = ifelse(sum(Action=="X")>=4,Descr[Action=="X"][4],"0"))

返回:

  People      Date   Accesses FirstX SecondX ThirdX FourthX
   <chr>      <chr>    <int>  <chr>   <chr>  <chr>   <chr>
1      j 01/01/2010        2      A       0      0       0
2      j 02/01/2010        1      0       0      0       0
3      j 03/01/2010        4      D       E      A       0

请注意,在同一向量中不能包含数字0和字符,因此我将字符0放在FirstX,SecondX,..列中。

答案 1 :(得分:0)

我自己找到了解决方案。我在这里发布,以防这对某人有用。

# create temp variables to be used for the count(just a vector of all the 
numbers from 1 to N)
subset$temp_var1<-c(1:N)


#generate a variable which starts counting from one and starts again 
every time "date" or "people" change
subset$count<-ave(subset$temp_var1 , subset$date , 
subset$people ,  FUN = seq_along)

#drop variable "Action" 
subset<-subset( subset, select=c("people" , "date" , 
"descr" , "count"))

#reshape
subset_comuni<-reshape(subset_comuni , idvar=c("nome_utente" , "date") , 
timevar = "count" , direction = "wide")