R中变量的联合出现

时间:2016-06-28 14:12:25

标签: r data-manipulation

我想计算个体并组合变量的出现(1代表存在,0代表缺席)。这可以通过for(var i=0; i<files.length; i++) { // Create transfer object (function(curr) { // `i` : `curr` var xhr = new XMLHttpRequest(); var progress = document.createElement("progress"); progress.setAttribute("max", files[curr].size); document.body.appendChild(progress); progress.className = "progress-" + curr; // Attach progress event handler xhr.upload.addEventListener("progress", function(e) { // update current `progress` `value` document.querySelector(".progress-" + curr).value = e.loaded; }); xhr.open("POST", "/path/to/server/"); xhr.send(new FormData(files[curr])); }(i)); // pass `i` to IIFE } 函数的多次使用来获得(参见下面的MWE)。是否有可能使用更有效的方法来获得下面给出的所需输出?

<script>
    function Cat(name, breed) {
      this.name = name;
      this.breed = breed;
    }

    Cat.prototype.meow = function() {
      console.log('Meow!');
    };

    var cheshire = new Cat("Cheshire Cat", "British Shorthair");
    var gary = new Cat("Gary", "Domestic Shorthair");

    alert(console.log(cheshire.meow));
    alert(console.log(gary.meow));
</script>

所需输出

我需要以下内容:

table

3 个答案:

答案 0 :(得分:1)

扩展Sumedh的答案,您也可以动态执行此操作,而无需每次都指定过滤器。如果您要合并的列数超过3列,这将非常有用。

您可以这样做:

lapply(seq_len(ncol(df)), function(i){
  # Generate all the combinations of i element on all columns
  tmp_i = utils::combn(names(df), i)
  # In the columns of tmp_i we have the elements in the combination
  apply(tmp_i, 2, function(x){
    dynamic_formula = as.formula(paste("~", paste(x, "== 1", collapse = " & ")))
    df %>% 
      filter_(.dots = dynamic_formula) %>% 
      summarize(Count = n()) %>% 
      mutate(type = paste0(sort(x), collapse = ""))
  }) %>% 
    bind_rows()
}) %>% 
  bind_rows()

这将:

1)生成df列的所有组合。首先是与一个元素(A,B,C)的组合,然后是具有两个元素(AB,AC,BC)等的组合。 这是外部lapply

2)然后为每个组合创建一个动态公式。例如,对于AB,公式将是A == 1&amp; B == 1,正如Sumedh建议的那样。这是dynamic_formula位。

3)将使用动态生成的公式过滤数据帧并计算行数

4)将所有这些绑定在一起(两个bind_rows

输出

  Count type
1    52    A
2    47    B
3    66    C
4    24   AB
5    30   AC
6    34   BC
7    15  ABC

答案 1 :(得分:1)

编辑添加:我现在看到你不想获得独家统计(即A和AB都应包含所有As)。

今天我得到了不止一点nerd-sniped,特别是因为我想用没有包的基础R解决它。以下应该这样做。

有一个非常容易(原则上)的解决方案,只使用xtabs(),我已在下面说明。但是,要将其概括为任何潜在数量的维度,然后将其应用于各种组合,实际上更难。我努力避免使用可怕的eval(parse())

set.seed(12345)
A <- rbinom(n = 100, size = 1, prob = 0.5)
B <- rbinom(n = 100, size = 1, prob = 0.6)
C <- rbinom(n = 100, size = 1, prob = 0.7)
df <- data.frame(A, B, C)

# Turn strings off
options(stringsAsFactors = FALSE)

# Obtain the n-way frequency table
# This table can be directly subset using []
# It is a little tricky to pass the arguments
# I'm trying to avoid eval(parse())
# But still give a solution that isn't bound to a specific size
xtab_freq <- xtabs(formula = formula(x = paste("~",paste(names(df),collapse = " + "))),
                   data = df)

# Demonstrating what I mean
# All A
sum(xtab_freq["1",,])
# [1] 52

# AC
sum(xtab_freq["1",,"1"])
# [1] 30

# Using lapply(), we pass names(df) to combn() with m values of 1, 2, and 3
# The output of combn() goes through list(), then is unlisted with recursive FALSE
# This gives us a list of vectors
# Each one being a combination in which we are interested
lst_combs <- unlist(lapply(X = 1:3,FUN = combn,x = names(df),list),recursive = FALSE)

# For nice output naming, I just paste the values together
names(lst_combs) <- sapply(X = lst_combs,FUN = paste,collapse = "")

# This is a function I put together
# Generalizes process of extracting values from a crosstab
# It does it in this fashion to avoid eval(parse())
uFunc_GetMargins <- function(crosstab,varvector,success) {

    # Obtain the dimname-names (the names within each dimension)
    # From that, get the regular dimnames
    xtab_dnn <- dimnames(crosstab)
    xtab_dn <- names(xtab_dnn)

    # Use match() to get a numeric vector for the margins
    # This can be used in margin.table()
    tgt_margins <- match(x = varvector,table = xtab_dn)

    # Obtain a margin table
    marginal <- margin.table(x = crosstab,margin = tgt_margins)

    # To extract the value, figure out which marginal cell contains
    #   all variables of interest set to success
    # sapply() goes over all the elements of the dimname names
    # Finds numeric index in that dimension where the name == success
    # We subset the resulting vector by tgt_margins
    #  (to only get the cells in our marginal table)
    # Then, use prod() to multiply them together and get the location
    tgt_cell <- prod(sapply(X = xtab_dnn,
                            FUN = match,
                            x = success)[tgt_margins])

    # Return as named list for ease of stacking
    return(list(count = marginal[tgt_cell]))
}

# Doing a call of mapply() lets us get the results
do.call(what = rbind.data.frame,
        args = mapply(FUN = uFunc_GetMargins,
                      varvector = lst_combs,
                      MoreArgs = list(crosstab = xtab_freq,
                                      success = "1"),
                      SIMPLIFY = FALSE,
                      USE.NAMES = TRUE))
#     count
# A      52
# B      47
# C      66
# AB     24
# AC     30
# BC     34
# ABC    15

我放弃了使用aggregate

的先前解决方案

答案 2 :(得分:0)

使用dplyr
仅发生A:

library(dplyr)
df %>% filter(A == 1) %>% summarise(Total = nrow(.))

A和B的出现:

df %>% filter(A == 1, B == 1) %>% summarise(Total = nrow(.))

A,B和C的出现

df %>% filter(A == 1, B == 1, C == 1) %>% summarise(Total = nrow(.))