Question

我在R中跟随data.frame：

  Introvert      Extrovert      Nature       Presence
     0              -1            3             Yes     
     1               3            2             No
     2               5            4             Yes
     1              -2            0             No

现在，我想以下列方式编写响应代码：

    3,4 <- Positives
    0,1,2 <- Neutral
    < 0 <- Negatives

然后在Positives和Negatives之间获得Neutrals，Yes和No的计数。
我有20列反应，如上所述。我怎么能用简单的R代码呢？

我为每列做ifelse然后group_by。

我的样本所需数据框将是：

         Introvert_Positive      Introvert_Negative     Introvert_Neutral

  Yes        0                         0                      2
  No         0                         0                      2

Answer 1

这个怎么样？

library(tidyverse);
df %>%
    gather(key, value, -Presence) %>%
    mutate(bin = cut(
        value,
        breaks = c(-Inf, -1, 2.5, Inf),
        labels = c("Negatives", "Neutral", "Positives"))) %>%
    select(-value) %>%
    unite(col, key, bin, sep = "_") %>%
    count(Presence, col) %>%
    spread(col, n)
## A tibble: 2 x 6
#  Presence Extrovert_Negativ… Extrovert_Positi… Introvert_Neutr… Nature_Neutral
#  <fct>                 <int>             <int>            <int>          <int>
#1 No                        1                 1                2              2
#2 Yes                       1                 1                2             NA
## ... with 1 more variable: Nature_Positives <int>

说明：我们使用cut和labels重新编码回复;其余的问题是gather，unite相关列，count出现次数和spread从长到宽。

样本数据

df <- read.table(text =
    "Introvert      Extrovert      Nature       Presence
     0              -1            3             Yes
     1               3            2             No
     2               5            4             Yes
     1              -2            0             No", header = T)

Answer 2

为了娱乐/练习，我使用@MauritsEvers答案的工作流程创建了一个data.table方法。它比dplyr方法快约60％（见基准）

data.table

您可以跳过列{key}和bin的unite，因为在使用dcast时可以在与投射相同的步骤中处理。

df %>% 
  setDT() %>%
  melt( id = 4 ) %>%
  .[, bin := cut( value, 
                  breaks = c(-Inf, -1, 2.5, Inf),
                  labels = c("Negatives", "Neutral", "Positives") )] %>%
  .[, value := NULL] %>%
  .[, .N, by = c("Presence", "variable", "bin")] %>% 
  dcast( Presence ~ variable + bin, value.var = "N")



Presence Introvert_Neutral Extrovert_Negatives Extrovert_Positives Nature_Neutral Nature_Positives
1:       No                 2                   1                   1              2               NA
2:      Yes                 2                   1                   1             NA                2

基准

library(microbenchmark)
microbenchmark(
  dplyr = {
    df %>%
      gather(key, value, -Presence) %>%
      mutate(bin = cut(
        value,
        breaks = c(-Inf, -1, 2.5, Inf),
        labels = c("Negatives", "Neutral", "Positives"))) %>%
      select(-value) %>%
      unite(col, key, bin, sep = "_") %>%
      count(Presence, col) %>%
      spread(col, n)
  },
  data.table = {
    df %>% 
      setDT() %>%
      melt( id = 4 ) %>%
      .[, bin := cut( value, 
                      breaks = c(-Inf, -1, 2.5, Inf),
                      labels = c("Negatives", "Neutral", "Positives") )] %>%
      .[, value := NULL] %>%
      .[, .N, by = c("Presence", "variable", "bin")] %>% 
      dcast( Presence ~ variable + bin, value.var = "N")
  },
  times = 1000
)

Unit: milliseconds
       expr      min        lq     mean    median        uq      max neval
      dplyr 9.636224 10.083903 10.59597 10.267371 10.458524 26.38649  1000
 data.table 3.458208  3.647401  3.92219  3.835239  3.949568 15.05596  1000

如何分组data.frame中的所有列？

2 个答案:

样本数据

data.table

基准