在R中的某些条件下更新列值

时间:2017-04-03 04:16:42

标签: r

    > ds[1:20,1:5]
   idSite idVisit         visitIp        visitorId   type
1       1    4103     8.37.230.12 0b146529434a43e3 action
2       1    4103     8.37.230.12 0b146529434a43e3 action
4       1    4100 117.212.128.163 2fda542e2cac67d4 action
5       1    4100 117.212.128.163 2fda542e2cac67d4 action
6       1    4100 117.212.128.163 2fda542e2cac67d4 action
8       1    4102  187.134.160.17 ab2413b2ed5bccc4 action
11      1    4099  168.235.201.23 5e8b3f87bd30cc1b action
12      1    4099  168.235.201.23 5e8b3f87bd30cc1b action
13      1    4099  168.235.201.23 5e8b3f87bd30cc1b action
14      1    4099  168.235.201.23 5e8b3f87bd30cc1b action
16      1    4101   5.107.224.242 fc77e4a99d153c16 action
19      1    4098  119.156.96.132 d083c7814aefc5e4 action
21      1    4097  95.221.204.238 87b98db4b05df2b0 action
23      1    4096  122.173.30.126 4386834b62126a2b action
25      1    4092   42.109.204.55 4744bd421d7f06b8 action
26      1    4092   42.109.204.55 4744bd421d7f06b8 action
27      1    4092   42.109.204.55 4744bd421d7f06b8 action
28      1    4092   42.109.204.55 4744bd421d7f06b8 action
29      1    4092   42.109.204.55 4744bd421d7f06b8 action
32      1    4041   49.35.130.191 eb8795f74c372b41 action

在上面的数据框中,我想从最后一行转到第一行,并将列type重命名为“action1”/“action2 / ..”等等,但仅针对特定的visitIp如下所示

    > dactions[1:20,1:5]
   idSite idVisit         visitIp        visitorId    type
1       1    4103     8.37.230.12 0b146529434a43e3 action2
2       1    4103     8.37.230.12 0b146529434a43e3 action1
4       1    4100 117.212.128.163 2fda542e2cac67d4 action3
5       1    4100 117.212.128.163 2fda542e2cac67d4 action2
6       1    4100 117.212.128.163 2fda542e2cac67d4 action1
8       1    4102  187.134.160.17 ab2413b2ed5bccc4 action1
11      1    4099  168.235.201.23 5e8b3f87bd30cc1b action4
12      1    4099  168.235.201.23 5e8b3f87bd30cc1b action3
13      1    4099  168.235.201.23 5e8b3f87bd30cc1b action2
14      1    4099  168.235.201.23 5e8b3f87bd30cc1b action1
16      1    4101   5.107.224.242 fc77e4a99d153c16 action1
19      1    4098  119.156.96.132 d083c7814aefc5e4 action1
21      1    4097  95.221.204.238 87b98db4b05df2b0 action1
23      1    4096  122.173.30.126 4386834b62126a2b action1
25      1    4092   42.109.204.55 4744bd421d7f06b8 action5
26      1    4092   42.109.204.55 4744bd421d7f06b8 action4
27      1    4092   42.109.204.55 4744bd421d7f06b8 action3
28      1    4092   42.109.204.55 4744bd421d7f06b8 action2
29      1    4092   42.109.204.55 4744bd421d7f06b8 action1
32      1    4041   49.35.130.191 eb8795f74c372b41 action4

我有一个使用for循环的代码,但是在大型数据帧(超过30k行)的情况下需要花费太多时间我想避免使用for循环,这样可以更快地完成。我的代码如下

    #rename actions
ds$type<-as.characterds$type)
count<-0
visitedIp<-""
for(i in nrow(ds):1){
  if(ds[i,]$visitIp!=visitedIp){
    count<-1
    visitedIp<-ds[i,]$visitIp
    ds[i,]$type<-paste0(ds[i,]$type,as.character(count))
    next
  }else{
    count<-count+1
    ds[i,]$type<-paste0(ds[i,]$type,as.character(count))
  }
}
dactions<-ds

提前致谢。

1 个答案:

答案 0 :(得分:0)

要获得按给定变量分组的累积计数,通常会使用ave()函数和cumsum()函数:

#first arg is input to cumsum (set to 1 here), second arg is grouping variable
ave(rep(1, nrow(ds)), ds$visitIp, FUN = cumsum)

要按组获取反向计数器,您只需反转分组向量(第二个参数),使计数器反转,然后反转结果以使其与数据匹配:

rev(ave(rep(1, nrow(ds)), rev(ds$visitIp), FUN = cumsum))