我的data.frame"分析"是7个变量的180,010个obs。它的结构的缩写示例如下:
ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
Rating <- c("Poor", "Excellent", "Very Good", "Poor", "Good", "Fair",
"Very Good", "Fair", "Poor", "Excellent")
Speed <- c(10, 19, 20, 21, 22, 20, 20, 21, 23, 15)
我希望循环使用&#34;分析$ Speed&#34;并找到所有等于或高于19且等于或低于25的事件。应该有至少4个或更多符合此标准的连续值 - 如果有3个,则忽略这些值。我希望创建一个新的data.frame&#34;输出&#34;包含价值及其各自的&#34; ID&#34;,&#34;评级&#34;和&#34;速度&#34;但我不确定该怎么做。
例如,从上面开始:
ID <- c(1, 1, 1, 1, 2, 2, 2, 2)
Rating <- c("Excellent", "Very Good", "Poor", "Good", "Fair", "Very
Good", "Fair", "Poor")
Speed <- c(19, 20, 21, 22, 20, 20, 21, 23)
我在编写循环方面的经验非常有限(无),大多数问题都是定量数据或搜索字符串,而我的是混合。
答案 0 :(得分:0)
这对我有用:
ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
Rating <- c("Poor", "Excellent", "Very Good", "Poor", "Good", "Fair",
"Very Good", "Fair", "Poor", "Excellent")
Speed <- c(10, 19, 20, 21, 22, 20, 20, 21, 23, 15)
ID2 <- c()
Rating2 <- c()
Speed2 <- c()
for (i in 1:length(Speed)) {
if (Speed[i] >= 25 | Speed[i] <= 19){
ID2 <- c(ID2,ID[i])
Rating2 <- c(Rating2,Rating[i])
Speed2 <- c(Speed2, Speed[i])
}
}
Output <- data.frame(ID = ID2, Rating = Rating2, Speed = Speed2)
答案 1 :(得分:0)
假设'ID'是分组变量而“Speed&lt; = 19”将改为“Speed&lt; 19”,我们可以使用ave
和rle
来获取逻辑索引<连续元素> 3使用条件(“速度&lt; 19 |速度&gt; = 25”),并使用该索引对原始数据集进行子集化。
f1 <- function(dat,Var1 , Var2){
indx <- as.logical(with(dat, ave(Var2, Var1, FUN=function(x) {
inverse.rle(within.list(rle(x < 19 | x>=25),
values <- lengths[values] >3 |!values))})))
dat[indx,]
}
f1(Analysis, ID, Speed)
# ID Rating Speed
#2 1 Excellent 19
#3 1 Very Good 20
#4 1 Poor 21
#5 1 Good 22
#6 2 Fair 20
#7 2 Very Good 20
#8 2 Fair 21
#9 2 Poor 23
使用另一个例子(具有超过3个符合条件的连续元素)
f1(AnalysisN, ID, Speed)
# ID Rating Speed
#2 1 Excellent 20
#3 1 Excellent 15
#4 1 Excellent 27
#5 1 Excellent 19
#6 2 Poor 22
#7 2 Fair 14
#8 2 Poor 20
#9 2 Fair 22
#12 3 Excellent 11
#13 3 Very Good 18
#14 3 Fair 10
#15 3 Poor 15
#16 4 Fair 19
#17 4 Excellent 23
#18 4 Fair 26
#19 4 Very Good 20
#22 5 Very Good 26
#23 5 Poor 15
#24 5 Excellent 29
#25 5 Excellent 13
Analysis <- structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
Rating = c("Poor",
"Excellent", "Very Good", "Poor", "Good", "Fair", "Very Good",
"Fair", "Poor", "Excellent"), Speed = c(10, 19, 20, 21, 22, 20,
20, 21, 23, 15)), .Names = c("ID", "Rating", "Speed"),
row.names = c(NA, -10L), class = "data.frame")
set.seed(30)
AnalysisN <- data.frame(ID= rep(1:5, each=5),
Rating= sample(c('Poor', 'Excellent', 'Very Good', 'Fair'), 25,
replace=TRUE), Speed =sample(10:30, 25, replace=TRUE),
stringsAsFactors=FALSE)