根据模式创建新列

时间:2015-09-25 21:40:12

标签: r data.table

我有一个大型数据集,其模式类似于下面的dataPattern。我需要有关代码的帮助来创建desiredresult数据集

library(data.table)    

V1 <- rep(c(rep("a", times = 2), letters[2:5], 
                    rep("f", times = 2)), times = 2)


V2 <- c(c(c(0.24, 0.25), 2:5, c(0.95, 1.05)),
               c(c(0.34, 0.35), 2:5, c(1.95, 2.05)) )

(dataPattern <- data.table(V1, V2))

(desiredresult <- data.table(V1, V2, c(rep(c(0.24, 0.25), times = 4),
                             rep(c(0.34, 0.35), times = 4)),
                     c(rep(c(0.95, 1.05), times = 4),
                             rep(c(1.95, 2.05), times = 4))))

我需要帮助才能在V3中创建列desiredresult。模式如下:

如果V1 == "a"V3 = V2 如果V1 != "a"我们重复上一组相应的V2值,直到达到a的新值,则V2的新值将放在V3中,以上重复a的所有新值。

我还需要您的代码帮助,在V4中创建列desiredresult,类似于列V3,但它会检查V1 == "f"是否放置值fV2 V4 V1 != "f" rle(dataPattern$V1 == "a" ) # Run Length Encoding # lengths: int [1:4] 2 6 2 6 # values : logi [1:4] TRUE FALSE TRUE FALSE V1 != "a"

重复V1 != "f"

我试过了:

FALSE

TRUEa似乎等于a的数量减去class Human: def __init__(self,name,surname,age): self.name = name self.surname = surname self.age = age def getName(self): return self.name def getSurname(self): return self.surname def setName(self, name): self.name = name def setSurname(self, surname): self.surname = surname def setAge(self, age): self.age = age def getAge(self): return self.age class Student(Human): def __init__(self, name,surname,age,file): super().__init__(name, surname, age) self.file = file def getFile(self): return self.file def setFile(self, file): self.file = file student1 = Student("Jhon", "Santana", "20", "111000") input() 的数量的序列。这是每个<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/winjs/4.3.0/css/ui-light.min.css"> 序列需要重复的次数,直到达到新的onInit为止

非常感谢

2 个答案:

答案 0 :(得分:0)

好的,我认为这是一种更好的方法,可以根据V1=='a'将V2的值输入列中。

V1 <- rep(c(rep("a", times = 2), letters[2:5], 
            rep("f", times = 2)), times = 2)

V2 <- c(c(c(0.24, 0.25), 2:5, c(0.95, 1.05)),
        c(c(0.34, 0.35), 2:5, c(1.95, 2.05)) )

dataPattern <- data.frame(V1, V2)
dataPattern$V3 <- ifelse(dataPattern$V1 == "a", dataPattern$V2, NA)
dataPattern$V4 <- ifelse(dataPattern$V1 == "f", dataPattern$V2, NA)
for (i in 1:nrow(dataPattern)){
    if (dataPattern$V1[i] == "a"){
        tmpa <- dataPattern$V3[i]
    }
    if (is.na(dataPattern$V3[i])){
        dataPattern$V3[i] <- tmpa
    }
    if (dataPattern$V1[nrow(dataPattern)-(i-1)] == "f"){
        tmpf <- dataPattern$V4[nrow(dataPattern)-(i-1)]
    }
    if (is.na(dataPattern$V4[nrow(dataPattern)-(i-1)])){
        dataPattern$V4[nrow(dataPattern)-(i-1)] <- tmpf
    }
}

输出,根据您声明的规则,我认为比desiredoutput

更正确
> dataPattern
   V1   V2   V3   V4
1   a 0.24 0.24 0.95
2   a 0.25 0.25 0.95
3   b 2.00 0.25 0.95
4   c 3.00 0.25 0.95
5   d 4.00 0.25 0.95
6   e 5.00 0.25 0.95
7   f 0.95 0.25 0.95
8   f 1.05 0.25 1.05
9   a 0.34 0.34 1.95
10  a 0.35 0.35 1.95
11  b 2.00 0.35 1.95
12  c 3.00 0.35 1.95
13  d 4.00 0.35 1.95
14  e 5.00 0.35 1.95
15  f 1.95 0.35 1.95
16  f 2.05 0.35 2.05

答案 1 :(得分:0)

这似乎有效:

dataPattern[, `:=`(
  V3 = head(V2,2), 
  V4 = tail(V2,2)
), by=cumsum( V1 == "a" & shift(V1,type="lead") == "a" )]

结果通过all.equal(dataPattern, desiredresult)检查。根据您的实际用例情况,您可能需要在cumsum内添加不同的内容。