如何计算大型数据集中每10个数字的集合的模式(统计)

时间:2016-02-05 19:40:15

标签: r

如果我有1223455567 1777666666,我希望输出为5和6。 我怎么能用R语言做到这一点?

我知道如何找到每10个数据的均值,但我想要的是模式。

这是我尝试的意思

mean10 <- aggregate(level, list(rep(1:(nrow(level) %/% n+1),each = n, len = nrow(level))), mean)[-1];

并且模式的功能如下:

MODE <- function(dataframe){
    DF <- as.data.frame(dataframe)

    MODE2 <- function(x){      
    if (is.numeric(x) == FALSE){
        df <- as.data.frame(table(x))  
        df <- df[order(df$Freq), ]         
        m <- max(df$Freq)        
        MODE1 <- as.vector(as.character(subset(df, Freq == m)[, 1]))

        if (sum(df$Freq)/length(df$Freq)==1){
            warning("No Mode: Frequency of all values is 1", call. = FALSE)
        }else{
            return(MODE1)
        }

    }else{ 
        df <- as.data.frame(table(x))  
        df <- df[order(df$Freq), ]         
        m <- max(df$Freq)        
        MODE1 <- as.vector(as.numeric(as.character(subset(df, Freq == m)[, 1])))

        if (sum(df$Freq)/length(df$Freq)==1){
            warning("No Mode: Frequency of all values is 1", call. = FALSE)
        }else{
            return(MODE1)
        }
    }
}

return(as.vector(lapply(DF, MODE2)))
}

4 个答案:

答案 0 :(得分:2)

这应该有效

Mode <- function(x) {
  y <- unique(x)
  y[which.max(tabulate(match(x, y)))]
}

library(zoo)
x<- c(1,2,2,3,4,5,5,5,6,7,1,7,7,7,6,6,6,6,6,6)
rollapply(data = x, width = 10, FUN = Mode, by = 10 )

答案 1 :(得分:1)

鉴于你不是在滚动模式之后但实际上是一种群组模式,其他答案都不准确。在你想到的情况下,这样做实际上要容易得多;我将使用data.table

#fixed cost: set-up of 'data.table'
library(data.table)
setDT(DF)

现在解决:

#this works on a single column;
#  the rep(...) bit is about creating the
#  sequence (1, ..., 1, 2, ..., 2, ...)
#  of integers each repeated 10 times.
#  Here, .N will give the frequency -- i.e.,
#  this first step is basically running 'table' for every 10 rows
DF[ , .N, by = .(col1, grp = rep(1:(.N %/% 10 + 1), length.out = .N)))
   #by going in descending order on frequency, we can simply 
   #  extract the first element of each 'grp' to get the mode.
   #  (this glosses over the issue of ties, but you haven't given
   #   any guidance to that end)
   ][order(-N), .SD[1L], by = grp] 

答案 2 :(得分:0)

您始终可以转换为character,并查看表格中哪个字符最大。 E.g。

> which.max(table(strsplit(as.character(1777666666),"")))
6 
2

答案 3 :(得分:0)

您可以使用zoo包来计算移动模式:

library(zoo)

# sample data
d <- data.frame(x = sample(1:3, 100, T))

# mode function (handles ties by choosing one)
my_mode <- function(x) as.numeric(which.max(table(x)))

# add moving mode as new variable
transform(d, moving_mode = rollapply(x, 10, FUN = my_mode, fill = NA))