根据使用循环的条件将列的值复制到另一列

时间:2014-11-25 12:21:49

标签: r for-loop

我需要创建一个复杂的“for”循环,但是在阅读了一些例子后,我仍然不知道如何以适当的R方式编写它,因此我不确定它是否会起作用。我还是R初学者:(

我有一个长格式的数据集,有不同的场合,但是,有些场合不是真正新的,因为开始日期是相同的,但是我需要在一个名为“ offence2“,之后我需要放弃虚假的新场合,以便只保留代表新场合的行。我的真实数据在一个日期内有多达8种不同的攻击,但我做了一个更简单的例子。

这是我的数据如何显示的示例

    id<-c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5)
    dstart<-c("25/11/2006", "13/12/2006","13/12/2006","07/02/2006","07/02/2006",
     "15/01/2006", "22/03/2006","18/09/2006", "04/03/2006","04/03/2006",
     "22/08/2006","22/08/2006","11/04/2006", "11/04/2006", "19/10/2006") 
    dstart1<-as.Date(dstart, "%d/%m/%Y")

    offence<-c("a","b","c","b","d","a","a","e","b","a","c","a","a","b","a")
    cod_offence<-c(25, 26,27,26,28,25,25,29,26,25,27,25,25,26,25)

    mydata<-data.frame(id, dstart1, offence, cod_offence)

数据

       id    dstart1   offence  cod_offence
   1   1   2006-11-25       a          25
   2   1   2006-12-13       b          26
   3   1   2006-12-13       c          27
   4   2   2006-02-07       b          26
   5   2   2006-02-07       d          28
   6   3   2006-01-15       a          25
   7   3   2006-03-22       a          25
   8   3   2006-09-18       e          29
   9   4   2006-03-04       b          26
   10  4   2006-03-04       a          25
   11  4   2006-08-22       c          27
   12  4   2006-08-22       a          25
   13  5   2006-04-11       a          25
   14  5   2006-04-11       b          26
   15  5   2006-10-19       a          25

我需要这样的东西:

      id    dstart1   offence  cod_offence   offence2
   1   1   2006-11-25       a          25       NA
   2   1   2006-12-13       b          26       c
   3   1   2006-12-13       c          27       NA
   4   2   2006-02-07       b          26       d
   5   2   2006-02-07       d          28       NA
   6   3   2006-01-15       a          25       NA
   7   3   2006-03-22       a          25       NA
   8   3   2006-09-18       e          29       NA
   9   4   2006-03-04       b          26       a
   10  4   2006-03-04       a          25       NA
   11  4   2006-08-22       c          27       a
   12  4   2006-08-22       a          25       NA
   13  5   2006-04-11       a          25       b
   14  5   2006-04-11       b          26       NA
   15  5   2006-10-19       a          25       NA

我认为我需要做这样的事情: 鉴于i =个人       j =个人观察

for each individual I need to check whether mydata$dstart1(j) = mydata$dstart1(j+1)
if this is true, then copy mydata$offence2(j)=mydata$offence(j+1), otherwise keep the same value
This has to stop if id(j) != id(j+1) and re-start with the new id.

我的问题是我不知道如何把它放在循环中。

谢谢!

更新

是的,它在这个例子中运行良好,但还没有我的真实数据,因为它们有点复杂 如果不是两个重复的日期我有三个或更多,会发生什么?他们每个人都有不同的罪行。在@CathG解决方案之后,我需要根据攻击次数创建更多变量(在我的情况下为8),我想我需要一个新的向量来识别id中观察的位置以及一个告诉R的新“指令”根据mydata $ dstart1的位置,需要将值复制到不同的列中。但话说回来,我不知道怎么做。

     id    dstart1   offence  cod_offence   offence2   offence3  offence4
   1   1   2006-11-25       a          25       NA        NA       NA
   2   1   2006-12-13       b          26       c         NA       NA
   3   1   2006-12-13       c          27       NA        NA       NA
   4   2   2006-02-07       b          26       d         NA       NA
   5   2   2006-02-07       d          28       NA        NA       NA
   6   2   2006-04-12       b          26       d         c        a
   7   2   2006-04-12       d          28       NA        NA       NA
   8   2   2006-04-12       c          27       NA        NA       NA
   9   2   2006-04-12       a          25       NA        NA       NA

再次感谢!!!

2 个答案:

答案 0 :(得分:1)

您可以使用base R

indx <- with(mydata, ave(as.numeric(dstart1), id,
           FUN=function(x) c(x[-1]==x[-length(x)], FALSE)))

 transform(mydata, offence2=ifelse(!!indx, 
            c(as.character(offence[-1]), NA), NA))

或使用dplyr

library(dplyr)
mydata %>%
      group_by(id) %>% 
      mutate(offence2= dstart1==lead(dstart1), 
       offence2= ifelse(!is.na(offence2)&offence2,
         as.character(lead(offence)), NA_character_))
#     id    dstart1 offence cod_offence offence2
#1   1 2006-11-25       a          25       NA
#2   1 2006-12-13       b          26        c
#3   1 2006-12-13       c          27       NA
#4   2 2006-02-07       b          26        d
#5   2 2006-02-07       d          28       NA
#6   3 2006-01-15       a          25       NA
#7   3 2006-03-22       a          25       NA
#8   3 2006-09-18       e          29       NA
#9   4 2006-03-04       b          26        a
#10  4 2006-03-04       a          25       NA
#11  4 2006-08-22       c          27        a
#12  4 2006-08-22       a          25       NA
#13  5 2006-04-11       a          25        b
#14  5 2006-04-11       b          26       NA
#15  5 2006-10-19       a          25       NA

或使用data.table

library(data.table)
setDT(mydata)[, indx:=c(dstart1[-1]==dstart1[-.N], FALSE), by=id][,
      offence2:=ifelse(indx, as.character(offence)[which(indx)+1],
                                 NA_character_), by=id][,indx:=NULL]

mydata
 #    id    dstart1 offence cod_offence offence2
 #1:  1 2006-11-25       a          25       NA
 #2:  1 2006-12-13       b          26        c
 #3:  1 2006-12-13       c          27       NA
 #4:  2 2006-02-07       b          26        d
 #5:  2 2006-02-07       d          28       NA
 #6:  3 2006-01-15       a          25       NA
 #7:  3 2006-03-22       a          25       NA
 #8:  3 2006-09-18       e          29       NA
 #9:  4 2006-03-04       b          26        a
#10:  4 2006-03-04       a          25       NA
#11:  4 2006-08-22       c          27        a
#12:  4 2006-08-22       a          25       NA
#13:  5 2006-04-11       a          25        b
#14:  5 2006-04-11       b          26       NA
#15:  5 2006-10-19       a          25       NA

更新

使用新数据集mydata2如果您使用第一种方法,我们会获得d1

 indx <- with(mydata2, ave(as.numeric(dstart1), id,
       FUN=function(x) c(x[-1]==x[-length(x)], FALSE)))

 d1 <-  transform(mydata2, offence2=ifelse(!!indx, 
                  c(as.character(offence[-1]), NA), NA))

d1,我们可以创建indx列,然后使用dcastlong表单转换为wideoffence2 }。如果列中包含所有NAs,我们可以使用colSums(is.na(删除该列。重命名列,然后使用mutate_each中的dplyr对列进行排序,最后使用cbind

mydata2对其进行排序
 d1$indx <- with(d1, ave(seq_along(id), id, dstart1, FUN=seq_along))
 library(reshape2)

 d2 <- dcast(d1, id + dstart1+indx~indx, value.var='offence2')
 d2New <- d2[,colSums(is.na(d2))!=nrow(d2)]
 nm1 <-  grep("^\\d",colnames(d2New))
 colnames(d2New)[nm1] <- paste0('offence', 2:(length(nm1)+1)) 
 d3 <- d2New[,-3] %>%
                group_by(id, dstart1) %>%
                mutate_each(funs(.[order(.)])) %>%
                ungroup()

 cbind(mydata,d3[,-c(1:2)])
 #    id    dstart1 offence cod_offence offence2 offence3 offence4
 #1  1 2006-11-25       a          25     <NA>     <NA>     <NA>
 #2  1 2006-12-13       b          26        c     <NA>     <NA>
 #3  1 2006-12-13       c          27     <NA>     <NA>     <NA>
 #4  2 2006-02-07       b          26        d     <NA>     <NA>
 #5  2 2006-02-07       d          28     <NA>     <NA>     <NA>
 #6  2 2006-04-12       b          26        d        c        a
 #7  2 2006-04-12       d          28     <NA>     <NA>     <NA>
 #8  2 2006-04-12       c          27     <NA>     <NA>     <NA>
 #9  2 2006-04-12       a          25     <NA>     <NA>     <NA>

数据

mydata <- structure(list(id = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 
5, 5), dstart1 = structure(c(13477, 13495, 13495, 13186, 13186, 
13163, 13229, 13409, 13211, 13211, 13382, 13382, 13249, 13249, 
13440), class = "Date"), offence = structure(c(1L, 2L, 3L, 2L, 
4L, 1L, 1L, 5L, 2L, 1L, 3L, 1L, 1L, 2L, 1L), .Label = c("a", 
"b", "c", "d", "e"), class = "factor"), cod_offence = c(25, 26, 
27, 26, 28, 25, 25, 29, 26, 25, 27, 25, 25, 26, 25)), .Names = c("id", 
"dstart1", "offence", "cod_offence"), row.names = c(NA, -15L), 
class = "data.frame")

mydata2 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
dstart1 = structure(c(13477, 13495, 13495, 13186, 13186, 13250, 13250,
 13250, 13250), class = "Date"), offence = c("a", "b", "c", "b", "d", "b",
"d", "c", "a"), cod_offence = c(25L, 26L, 27L, 26L, 28L, 26L, 28L, 27L, 25L
)), .Names = c("id", "dstart1", "offence", "cod_offence"), row.names =
 c("1","2", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")

答案 1 :(得分:1)

使用split和循环:

# data with repeated dates /offences
id<-c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5)
dstart<-c("25/11/2006", "13/12/2006","13/12/2006","07/02/2006","07/02/2006",
     "15/01/2006", "22/03/2006","18/09/2006", "04/03/2006","04/03/2006",
     "22/08/2006","22/08/2006","11/04/2006", "11/04/2006", "19/10/2006","19/10/2006","19/10/2006","19/10/2006") 
dstart1<-as.Date(dstart, "%d/%m/%Y")
offence<-c("a","b","c","b","d","a","a","e","b","a","c","a","a","b","a","c","b","a")
cod_offence<-c(25, 26,27,26,28,25,25,29,26,25,27,25,25,26,25,27,25,25)
mydata<-data.frame(id, dstart1, offence, cod_offence)

# see the max offences there are for same id and date
maxoff<-max(table(mydata$id,mydata$dstart1))
mydata[,paste("offence",2:maxoff,sep="")]<-NA

# split your data according to id
splitmydata<-split(mydata,mydata$id) 

# for each "per id dataset", apply a function that looks for repeated offences / dates and fill the "offences" variables in the row with first occurence of specific date.
splitmydata2<-lapply(splitmydata, 
                       function(tab){
                          for(datestart in unique(tab[,"dstart1"])){
                            ind_date<-sort(which(tab[,"dstart1"]==datestart))
                            if(length(ind_date[-1])){
                               tab[ind_date[1],grep("^offence",colnames(tab),value=T)[2:(length(ind_date))]]<-as.character(tab[ind_date[-1],"offence"])
                              }
                           }
                          return(tab)
                       }
                     )

mydata2<-unsplit(splitmydata2,mydata$id) # finally, unsplit your data

> mydata2
   id    dstart1 offence cod_offence offence2 offence3 offence4
1   1 2006-11-25       a          25     <NA>     <NA>     <NA>
2   1 2006-12-13       b          26        c     <NA>     <NA>
3   1 2006-12-13       c          27     <NA>     <NA>     <NA>
4   2 2006-02-07       b          26        d     <NA>     <NA>
5   2 2006-02-07       d          28     <NA>     <NA>     <NA>
6   3 2006-01-15       a          25     <NA>     <NA>     <NA>
7   3 2006-03-22       a          25     <NA>     <NA>     <NA>
8   3 2006-09-18       e          29     <NA>     <NA>     <NA>
9   4 2006-03-04       b          26        a     <NA>     <NA>
10  4 2006-03-04       a          25     <NA>     <NA>     <NA>
11  4 2006-08-22       c          27        a     <NA>     <NA>
12  4 2006-08-22       a          25     <NA>     <NA>     <NA>
13  5 2006-04-11       a          25        b     <NA>     <NA>
14  5 2006-04-11       b          26     <NA>     <NA>     <NA>
15  5 2006-10-19       a          25        c        b        a
16  5 2006-10-19       c          27     <NA>     <NA>     <NA>
17  5 2006-10-19       b          25     <NA>     <NA>     <NA>
18  5 2006-10-19       a          25     <NA>     <NA>     <NA>