> ds[1:20,1:5]
idSite idVisit visitIp visitorId type
1 1 4103 8.37.230.12 0b146529434a43e3 action
2 1 4103 8.37.230.12 0b146529434a43e3 action
4 1 4100 117.212.128.163 2fda542e2cac67d4 action
5 1 4100 117.212.128.163 2fda542e2cac67d4 action
6 1 4100 117.212.128.163 2fda542e2cac67d4 action
8 1 4102 187.134.160.17 ab2413b2ed5bccc4 action
11 1 4099 168.235.201.23 5e8b3f87bd30cc1b action
12 1 4099 168.235.201.23 5e8b3f87bd30cc1b action
13 1 4099 168.235.201.23 5e8b3f87bd30cc1b action
14 1 4099 168.235.201.23 5e8b3f87bd30cc1b action
16 1 4101 5.107.224.242 fc77e4a99d153c16 action
19 1 4098 119.156.96.132 d083c7814aefc5e4 action
21 1 4097 95.221.204.238 87b98db4b05df2b0 action
23 1 4096 122.173.30.126 4386834b62126a2b action
25 1 4092 42.109.204.55 4744bd421d7f06b8 action
26 1 4092 42.109.204.55 4744bd421d7f06b8 action
27 1 4092 42.109.204.55 4744bd421d7f06b8 action
28 1 4092 42.109.204.55 4744bd421d7f06b8 action
29 1 4092 42.109.204.55 4744bd421d7f06b8 action
32 1 4041 49.35.130.191 eb8795f74c372b41 action
在上面的数据框中,我想从最后一行转到第一行,并将列type
重命名为“action1”/“action2 / ..”等等,但仅针对特定的visitIp
如下所示
> dactions[1:20,1:5]
idSite idVisit visitIp visitorId type
1 1 4103 8.37.230.12 0b146529434a43e3 action2
2 1 4103 8.37.230.12 0b146529434a43e3 action1
4 1 4100 117.212.128.163 2fda542e2cac67d4 action3
5 1 4100 117.212.128.163 2fda542e2cac67d4 action2
6 1 4100 117.212.128.163 2fda542e2cac67d4 action1
8 1 4102 187.134.160.17 ab2413b2ed5bccc4 action1
11 1 4099 168.235.201.23 5e8b3f87bd30cc1b action4
12 1 4099 168.235.201.23 5e8b3f87bd30cc1b action3
13 1 4099 168.235.201.23 5e8b3f87bd30cc1b action2
14 1 4099 168.235.201.23 5e8b3f87bd30cc1b action1
16 1 4101 5.107.224.242 fc77e4a99d153c16 action1
19 1 4098 119.156.96.132 d083c7814aefc5e4 action1
21 1 4097 95.221.204.238 87b98db4b05df2b0 action1
23 1 4096 122.173.30.126 4386834b62126a2b action1
25 1 4092 42.109.204.55 4744bd421d7f06b8 action5
26 1 4092 42.109.204.55 4744bd421d7f06b8 action4
27 1 4092 42.109.204.55 4744bd421d7f06b8 action3
28 1 4092 42.109.204.55 4744bd421d7f06b8 action2
29 1 4092 42.109.204.55 4744bd421d7f06b8 action1
32 1 4041 49.35.130.191 eb8795f74c372b41 action4
我有一个使用for循环的代码,但是在大型数据帧(超过30k行)的情况下需要花费太多时间我想避免使用for循环,这样可以更快地完成。我的代码如下
#rename actions
ds$type<-as.characterds$type)
count<-0
visitedIp<-""
for(i in nrow(ds):1){
if(ds[i,]$visitIp!=visitedIp){
count<-1
visitedIp<-ds[i,]$visitIp
ds[i,]$type<-paste0(ds[i,]$type,as.character(count))
next
}else{
count<-count+1
ds[i,]$type<-paste0(ds[i,]$type,as.character(count))
}
}
dactions<-ds
提前致谢。
答案 0 :(得分:0)
要获得按给定变量分组的累积计数,通常会使用ave()
函数和cumsum()
函数:
#first arg is input to cumsum (set to 1 here), second arg is grouping variable
ave(rep(1, nrow(ds)), ds$visitIp, FUN = cumsum)
要按组获取反向计数器,您只需反转分组向量(第二个参数),使计数器反转,然后反转结果以使其与数据匹配:
rev(ave(rep(1, nrow(ds)), rev(ds$visitIp), FUN = cumsum))