7列随机样本,共80行

时间:2018-08-05 16:43:50

标签: r

我有一个具有7列80行的表格,看起来像这样

class test3
{
    public:

    int value;

    // constexpr const method - can't chanage the values of object fields and can be evaluated at compile time.
    constexpr int getvalue() const
    {
        return(value);
    }

    constexpr test3(int Value)
        : value(Value)
    {
    }
};


constexpr test3 x(100); // OK. Constructor is constexpr.

int array[x.getvalue()]; // OK. x.getvalue() is constexpr and can be evaluated at compile time.

我需要为每行随机抽取7列之一的样本。例如,对于所有80行,第1(8)行,第2(6)行,第3(10)行,依此类推。我可以使用示例函数吗?如果可以,那么如何使用?我可以使用NA做什么。我需要进行1000次采样,并计算每个样本的平均值。

任何帮助将不胜感激! 谢谢, 奥尔丁

3 个答案:

答案 0 :(得分:3)

这是使用plyr :: adply的解决方案。

library(plyr)

# original dataset
df1 <- data.frame(
   c( 6,  6,  9,  4,  3,  7,  9,  6, NA, 6),
   c( 7,  7, 10,  3,  2,  7,  5,  6,  6, 7),
   c( 7, 13, 10,  5,  5,  5,  8,  7,  5, 6),
   c( 8, 13,  8,  3,  5,  4,  8, NA,  5, 4),
   c(NA, 14, NA, NA,  6,  5, NA,  7, NA, 7),
   c(NA, NA, NA, NA, NA,  5, NA, NA, NA, 6)
)


# returns a single column from a row with NA's removed
samplerow <- function(r) {
  # r is a single row of df
  # eliminate NAs from the dataset.
  r <- r[!is.na(r)]
  # Return one sample from this row
  # Not sure what happens if the row is all NAs. Don't do that.
  r[sample.int(length(r),1)]
}

N <- 1000
# for N times,
# for each row select 1 non-NA valued column,
# take the mean of all rows
replicate(N, mean(adply(df1, 1, samplerow, .expand=F)$V1))
#...redacted...
N <- 5
set.seed(1)
replicate(N, mean(adply(df1, 1, samplerow, .expand=F)$V1))
[1] 6.0 6.2 6.2 7.0 7.1

答案 1 :(得分:2)

使用<script type="text/javascript"> function getItem() { var sku = document.getElementById('<%=RadSearchBox1.ClientID%>_Input').value; document.getElementById('<%= Label1.ClientID %>').innerHTML = sku; } </script>

sapply()

数据

sapply(as.data.frame(t(df1)), function(x) sample(na.omit(x), 1))

答案 2 :(得分:1)

我们可以使用apply遍历行,获取非NA元素并获取sample

n <- 1000
lst <- replicate(n, apply(df1, 1, function(x) sample(x[!is.na(x)], 1)),
               simplify = FALSE)
Reduce(`+`, lst)/n

或带有pmaprowMeans

library(tidyverse)
rowMeans(replicate(n, pmap_int(df1, ~
                          c(...) %>% 
                          na.omit %>%
                          sample(., 1))))

数据

set.seed(24)
df1 <- as.data.frame(matrix(sample(c(1:9, NA), 80 * 7, replace = TRUE), 80, 7))