How to cross tabulate (xtabs) with multiple vars but the same breakdown

时间:2017-12-18 04:55:57

标签: r dataframe crosstab

I have a data frame looking like this:

  SubjectID Activity        V1          V2          V3
1         2        S 0.2571778 -0.02328523 -0.01465376
2         2        W 0.2860267 -0.01316336 -0.11908252
3         3        R 0.2754848 -0.02605042 -0.11815167
4         3        W 0.2702982 -0.03261387 -0.11752018
5         4        A 0.2748330 -0.02784779 -0.12952716
6         4        S 0.2792199 -0.01862040 -0.11390197
...

(There are actually many more Vn variables, but this illustrates the problem.)

I would like to use xtabs() to look at all Vn vars, but keep SubjectID and Activity constant - something like

xtabs(c(V1, V2, V3) ~ SubjectID + Activity, data = DF)

or

lapply(c(V1, V2, V3), function(x) xtabs(x ~ SubjectID + Activity, data = DF))

but of course those don't work. What is the right method here?


Edit: What I want is the outputs of

xtabs(V1 ~ SubjectID + Activty, data = DF)
xtabs(V2 ~ SubjectID + Activty, data = DF)
xtabs(V3 ~ SubjectID + Activty, data = DF)
...

2 个答案:

答案 0 :(得分:2)

You should be able to just use get after supplying a character vector of the columns of interest.

lapply(c("V1", "V2", "V3"), function(x) xtabs(get(x) ~ SubjectID + Activity, data = DF))

Try it out with the "airquality" dataset:

setNames(lapply(names(airquality)[1:4], 
                function(x) xtabs(get(x) ~ Month + Day, airquality)), 
         names(airquality)[1:4])

Based on your comments, I'd recommend that you look at using "data.table" and dcasting if you require a wide dataset.

Here's an example:

set.seed(1)
DF <- cbind(warpbreaks, V2 = sample(100, nrow(warpbreaks)), V3 = sample(100, nrow(warpbreaks)))
library(data.table)
setDT(DF)
lapply(c("breaks", "V2", "V3"), function(x) {
  dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, value.var = x) 
})
# [[1]]
#    wool        L        M        H
# 1:    A 44.55556 24.00000 24.55556
# 2:    B 28.22222 28.77778 18.77778
# 
# [[2]]
#    wool        L        M        H
# 1:    A 59.22222 46.33333 33.22222
# 2:    B 49.44444 44.77778 43.22222
# 
# [[3]]
#    wool  L        M        H
# 1:    A 40 68.11111 74.22222
# 2:    B 48 40.11111 37.77778

Alternatively, you can just have a totally wide "data.table", like this:

dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, 
      value.var = c("breaks", "V2", "V3"))
#    wool breaks_L breaks_M breaks_H     V2_L     V2_M     V2_H V3_L     V3_M     V3_H
# 1:    A 44.55556 24.00000 24.55556 59.22222 46.33333 33.22222   40 68.11111 74.22222
# 2:    B 28.22222 28.77778 18.77778 49.44444 44.77778 43.22222   48 40.11111 37.77778

答案 1 :(得分:1)

Using a tidy approach, this is how I would tackle the problem:

library(tidyr)
library(dplyr)
library(purrr)

df <- tribble(
  ~SubjectID, ~Activity,       ~V1,         ~V2,         ~V3,
           2,       "S", 0.2571778, -0.02328523, -0.01465376,
           2,       "W", 0.2860267, -0.01316336, -0.11908252,
           3,       "R", 0.2754848, -0.02605042, -0.11815167,
           3,       "W", 0.2702982, -0.03261387, -0.11752018,
           4,       "A", 0.2748330, -0.02784779, -0.12952716,
           4,       "S", 0.2792199, -0.01862040, -0.11390197
)

df %>%
  select(starts_with("V")) %>%
  map(~{
    as_tibble(xtabs(.x ~ SubjectID + Activity, data = df))
  }) %>%
  bind_rows(.id = "var") %>%
  spread(Activity, n)

# # A tibble: 9 x 6
#     var SubjectID           A           R           S           W
# * <chr>     <chr>       <dbl>       <dbl>       <dbl>       <dbl>
# 1    V1         2  0.00000000  0.00000000  0.25717780  0.28602670
# 2    V1         3  0.00000000  0.27548480  0.00000000  0.27029820
# 3    V1         4  0.27483300  0.00000000  0.27921990  0.00000000
# 4    V2         2  0.00000000  0.00000000 -0.02328523 -0.01316336
# 5    V2         3  0.00000000 -0.02605042  0.00000000 -0.03261387
# 6    V2         4 -0.02784779  0.00000000 -0.01862040  0.00000000
# 7    V3         2  0.00000000  0.00000000 -0.01465376 -0.11908252
# 8    V3         3  0.00000000 -0.11815167  0.00000000 -0.11752018
# 9    V3         4 -0.12952716  0.00000000 -0.11390197  0.00000000