我编写了一个脚本,用于从一组文件中提取原始数据作为大小不同的向量。数据本身是由时间组成的,我试图在彼此之间.5秒内发生的所有时间的数据中找到簇。我正在尝试使用向量操纵来解决此问题,但是我无法弄清楚如何获得准确数量的簇(更不用说簇大小了)。
我试图编写一个函数,该函数使用一系列if else语句来逐个比较数字,但这最终导致出现错误的可能性很大:
cluster.count <- function(x, threshold = 0.5){
len = length(x)
counter = 0
clusters = 1
while (counter<(len-1)){
if (c(x,1)[counter] - c(0,x)[counter] <= threshold){
counter = counter +1
}else{
clusters = clusters +1
counter = counter +1
}
if ((x[len-1]-x[len-2])<= threshold){
clusters = clusters +1
}
return(clusters)}
}
因此,我决定开始使用向量整形从向量中减去向量的副本(移了一位),以查找哪些值小于我的阈值。这也提出了一些问题:
cluster.count <- function(x, threshold = 0.5){
x1 = c(x,10) # adding 10 to the end of x, so that x2 can be subtracted from it
x2 = c(0,x) # adding 0 to the beginning of x, so that x1 can be subtracted from x1
x1 = as.numeric(as.character(unlist(x1))) #converting x1 into a vector
x2 = as.numeric(as.character(unlist(x2))) #converting x2 into a vector
result1 = (x1-x2)
result1 = as.numeric(result1 < threshold) #check values that are less than the threshold
result2 = c(result1[-1], 10)
result3 = result2 - result1
print(result3)
return(result1)
}
对于样本向量c(1.1,1.2,1.3,2,2.2,5)我的当前函数给出了以下返回值:1 0 -1 1 -1 0 10
有比我看到的更简单的方法吗?