我有一个大型数据集,其模式类似于下面的dataPattern
。我需要有关代码的帮助来创建desiredresult
数据集
library(data.table)
V1 <- rep(c(rep("a", times = 2), letters[2:5],
rep("f", times = 2)), times = 2)
V2 <- c(c(c(0.24, 0.25), 2:5, c(0.95, 1.05)),
c(c(0.34, 0.35), 2:5, c(1.95, 2.05)) )
(dataPattern <- data.table(V1, V2))
(desiredresult <- data.table(V1, V2, c(rep(c(0.24, 0.25), times = 4),
rep(c(0.34, 0.35), times = 4)),
c(rep(c(0.95, 1.05), times = 4),
rep(c(1.95, 2.05), times = 4))))
我需要帮助才能在V3
中创建列desiredresult
。模式如下:
如果V1 == "a"
则V3 = V2
如果V1 != "a"
我们重复上一组相应的V2
值,直到达到a
的新值,则V2
的新值将放在V3
中,以上重复a
的所有新值。
我还需要您的代码帮助,在V4
中创建列desiredresult
,类似于列V3
,但它会检查V1 == "f"
是否放置值f
从V2
V4
V1 != "f"
rle(dataPattern$V1 == "a" )
# Run Length Encoding
# lengths: int [1:4] 2 6 2 6
# values : logi [1:4] TRUE FALSE TRUE FALSE
并V1 != "a"
V1 != "f"
我试过了:
FALSE
TRUE
或a
似乎等于a
的数量减去class Human:
def __init__(self,name,surname,age):
self.name = name
self.surname = surname
self.age = age
def getName(self):
return self.name
def getSurname(self):
return self.surname
def setName(self, name):
self.name = name
def setSurname(self, surname):
self.surname = surname
def setAge(self, age):
self.age = age
def getAge(self):
return self.age
class Student(Human):
def __init__(self, name,surname,age,file):
super().__init__(name, surname, age)
self.file = file
def getFile(self):
return self.file
def setFile(self, file):
self.file = file
student1 = Student("Jhon", "Santana", "20", "111000")
input()
的数量的序列。这是每个<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/winjs/4.3.0/css/ui-light.min.css">
序列需要重复的次数,直到达到新的onInit
为止
非常感谢
答案 0 :(得分:0)
好的,我认为这是一种更好的方法,可以根据V1=='a'
将V2的值输入列中。
V1 <- rep(c(rep("a", times = 2), letters[2:5],
rep("f", times = 2)), times = 2)
V2 <- c(c(c(0.24, 0.25), 2:5, c(0.95, 1.05)),
c(c(0.34, 0.35), 2:5, c(1.95, 2.05)) )
dataPattern <- data.frame(V1, V2)
dataPattern$V3 <- ifelse(dataPattern$V1 == "a", dataPattern$V2, NA)
dataPattern$V4 <- ifelse(dataPattern$V1 == "f", dataPattern$V2, NA)
for (i in 1:nrow(dataPattern)){
if (dataPattern$V1[i] == "a"){
tmpa <- dataPattern$V3[i]
}
if (is.na(dataPattern$V3[i])){
dataPattern$V3[i] <- tmpa
}
if (dataPattern$V1[nrow(dataPattern)-(i-1)] == "f"){
tmpf <- dataPattern$V4[nrow(dataPattern)-(i-1)]
}
if (is.na(dataPattern$V4[nrow(dataPattern)-(i-1)])){
dataPattern$V4[nrow(dataPattern)-(i-1)] <- tmpf
}
}
输出,根据您声明的规则,我认为比desiredoutput
:
> dataPattern
V1 V2 V3 V4
1 a 0.24 0.24 0.95
2 a 0.25 0.25 0.95
3 b 2.00 0.25 0.95
4 c 3.00 0.25 0.95
5 d 4.00 0.25 0.95
6 e 5.00 0.25 0.95
7 f 0.95 0.25 0.95
8 f 1.05 0.25 1.05
9 a 0.34 0.34 1.95
10 a 0.35 0.35 1.95
11 b 2.00 0.35 1.95
12 c 3.00 0.35 1.95
13 d 4.00 0.35 1.95
14 e 5.00 0.35 1.95
15 f 1.95 0.35 1.95
16 f 2.05 0.35 2.05
答案 1 :(得分:0)
这似乎有效:
dataPattern[, `:=`(
V3 = head(V2,2),
V4 = tail(V2,2)
), by=cumsum( V1 == "a" & shift(V1,type="lead") == "a" )]
结果通过all.equal(dataPattern, desiredresult)
检查。根据您的实际用例情况,您可能需要在cumsum
内添加不同的内容。