Question

我已经问了很多关于这方面的问题，而且所有答案都非常有用......但我的数据再一次很奇怪，我需要帮助...基本上，我想要的是找到一定的平均速度间隔范围......比方说从6秒到40秒我的平均速度是5米/秒......等等。所以有人指出我使用这段代码......

library(IRanges)
idx <- seq(1, ncol(data), by=2)
# idx is now 1, 3, 5. It will be passed one value at a time to `i`.
# that is, `i` will take values 1 first, then 3 and then 5 and each time
# the code within is executed.
o <- lapply(idx, function(i) {  
    ir1 <- IRanges(start=seq(0, max(data[[i]]), by=401), width=401)
    ir2 <- IRanges(start=data[[i]], width=1)
    t <- findOverlaps(ir1, ir2)
    d <- data.frame(mean=tapply(data[[i+1]], queryHits(t), mean))
    cbind(as.data.frame(ir1), d)
})

给出了这个输出

# > o
# [[1]]
#   start end width mean
# 1     0 400   401 1.05
# 
# [[2]]
#   start end width mean
# 1     0 400   401  1.1
# 
# [[3]]
#   start end width     mean
# 1     0 400   401 1.383333

所以，如果我希望它每隔100秒......我只需将ir1 <- ....., by = 401更改为by=100。

但由于一些事情，我的数据很奇怪

我的数据并不总是从0秒开始，有时它从20秒开始......取决于样本及其是否移动
我的数据收集不是每1s或2s或3s发生一次。因此，有时我得到数据1-20秒，但它只是因为标本不移动而跳过20-40秒。
我认为代码的findOverlaps部分会影响我的输出。如何在不打扰输出的情况下摆脱它？

以下是一些数据来说明我的麻烦......但我所有的真实数据都是在2000年代结束的

Time    Speed   Time    Speed   Time    Speed
6.3 1.6 3.1 1.7 0.3 2.4
11.3    1.3 5.1 2.2 1.3 1.3
13.8    1.3 6.3 3.4 3.1 1.5
14.1    1.0 7.0 2.3 4.5 2.7
47.4    2.9 11.3    1.2 5.1 0.5
49.2    0.7 26.5    3.3 5.9 1.7
50.5    0.9 27.3    3.4 9.7 2.4
57.1    1.3 36.6    2.5 11.8    1.3
72.9    2.9 40.3    1.1 13.1    1.0
86.6    2.4 44.3    3.2 13.8    0.6
88.5    3.4 50.9    2.6 14.0    2.4
89.0    3.0 62.6    1.5 14.8    2.2
94.8    2.9 66.8    0.5 15.5    2.6
117.4   0.5 67.3    1.1 16.4    3.2
123.7   3.2 67.7    0.6 26.5    0.9
124.5   1.0 68.2    3.2 44.7    3.0
126.1   2.8 72.1    2.2 45.1    0.8

从数据中可以看出，它不一定以60秒等结束，有时它只会在57等结束

编辑添加数据输入

structure(list(Time = c(6.3, 11.3, 13.8, 14.1, 47.4, 49.2, 50.5, 
57.1, 72.9, 86.6, 88.5, 89, 94.8, 117.4, 123.7, 124.5, 126.1), 
    Speed = c(1.6, 1.3, 1.3, 1, 2.9, 0.7, 0.9, 1.3, 2.9, 2.4, 
    3.4, 3, 2.9, 0.5, 3.2, 1, 2.8), Time.1 = c(3.1, 5.1, 6.3, 
    7, 11.3, 26.5, 27.3, 36.6, 40.3, 44.3, 50.9, 62.6, 66.8, 
    67.3, 67.7, 68.2, 72.1), Speed.1 = c(1.7, 2.2, 3.4, 2.3, 
    1.2, 3.3, 3.4, 2.5, 1.1, 3.2, 2.6, 1.5, 0.5, 1.1, 0.6, 3.2, 
    2.2), Time.2 = c(0.3, 1.3, 3.1, 4.5, 5.1, 5.9, 9.7, 11.8, 
    13.1, 13.8, 14, 14.8, 15.5, 16.4, 26.5, 44.7, 45.1), Speed.2 = c(2.4, 
    1.3, 1.5, 2.7, 0.5, 1.7, 2.4, 1.3, 1, 0.6, 2.4, 2.2, 2.6, 
    3.2, 0.9, 3, 0.8)), .Names = c("Time", "Speed", "Time.1", 
"Speed.1", "Time.2", "Speed.2"), class = "data.frame", row.names = c(NA, 
-17L))

Answer 1

对不起，如果我完全不理解你的问题，你能解释为什么这个例子没有做你想做的事情吗？

# use a pre-loaded data set
mtcars

# choose which variable to cut
var <- 'mpg'

# define groups, whether that be time or something else
# and choose how to cut it.
x <- cut( mtcars[ , var ] , c( -Inf , seq( 15 , 25 , by = 2.5 ) , Inf ) )

# look at your cut points, for every record
x 

# you can merge them back on to the mtcars data frame if you like..
mtcars$cutpoints <- x
# ..but that's not necessary

# find the mean within those groups
tapply( 
    mtcars[ , var ] , 
    x ,
    mean
)


# find the mean within groups, using a different variable
tapply( 
    mtcars[ , 'wt' ] , 
    x ,
    mean
)

R编程有助于编辑代码

1 个答案: