I have a data frame looking like this:
SubjectID Activity V1 V2 V3
1 2 S 0.2571778 -0.02328523 -0.01465376
2 2 W 0.2860267 -0.01316336 -0.11908252
3 3 R 0.2754848 -0.02605042 -0.11815167
4 3 W 0.2702982 -0.03261387 -0.11752018
5 4 A 0.2748330 -0.02784779 -0.12952716
6 4 S 0.2792199 -0.01862040 -0.11390197
...
(There are actually many more Vn variables, but this illustrates the problem.)
I would like to use xtabs()
to look at all Vn vars, but keep SubjectID and Activity constant - something like
xtabs(c(V1, V2, V3) ~ SubjectID + Activity, data = DF)
or
lapply(c(V1, V2, V3), function(x) xtabs(x ~ SubjectID + Activity, data = DF))
but of course those don't work. What is the right method here?
Edit: What I want is the outputs of
xtabs(V1 ~ SubjectID + Activty, data = DF)
xtabs(V2 ~ SubjectID + Activty, data = DF)
xtabs(V3 ~ SubjectID + Activty, data = DF)
...
答案 0 :(得分:2)
You should be able to just use get
after supplying a character vector of the columns of interest.
lapply(c("V1", "V2", "V3"), function(x) xtabs(get(x) ~ SubjectID + Activity, data = DF))
Try it out with the "airquality" dataset:
setNames(lapply(names(airquality)[1:4],
function(x) xtabs(get(x) ~ Month + Day, airquality)),
names(airquality)[1:4])
Based on your comments, I'd recommend that you look at using "data.table" and dcast
ing if you require a wide dataset.
Here's an example:
set.seed(1)
DF <- cbind(warpbreaks, V2 = sample(100, nrow(warpbreaks)), V3 = sample(100, nrow(warpbreaks)))
library(data.table)
setDT(DF)
lapply(c("breaks", "V2", "V3"), function(x) {
dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, value.var = x)
})
# [[1]]
# wool L M H
# 1: A 44.55556 24.00000 24.55556
# 2: B 28.22222 28.77778 18.77778
#
# [[2]]
# wool L M H
# 1: A 59.22222 46.33333 33.22222
# 2: B 49.44444 44.77778 43.22222
#
# [[3]]
# wool L M H
# 1: A 40 68.11111 74.22222
# 2: B 48 40.11111 37.77778
Alternatively, you can just have a totally wide "data.table", like this:
dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension,
value.var = c("breaks", "V2", "V3"))
# wool breaks_L breaks_M breaks_H V2_L V2_M V2_H V3_L V3_M V3_H
# 1: A 44.55556 24.00000 24.55556 59.22222 46.33333 33.22222 40 68.11111 74.22222
# 2: B 28.22222 28.77778 18.77778 49.44444 44.77778 43.22222 48 40.11111 37.77778
答案 1 :(得分:1)
Using a tidy approach, this is how I would tackle the problem:
library(tidyr)
library(dplyr)
library(purrr)
df <- tribble(
~SubjectID, ~Activity, ~V1, ~V2, ~V3,
2, "S", 0.2571778, -0.02328523, -0.01465376,
2, "W", 0.2860267, -0.01316336, -0.11908252,
3, "R", 0.2754848, -0.02605042, -0.11815167,
3, "W", 0.2702982, -0.03261387, -0.11752018,
4, "A", 0.2748330, -0.02784779, -0.12952716,
4, "S", 0.2792199, -0.01862040, -0.11390197
)
df %>%
select(starts_with("V")) %>%
map(~{
as_tibble(xtabs(.x ~ SubjectID + Activity, data = df))
}) %>%
bind_rows(.id = "var") %>%
spread(Activity, n)
# # A tibble: 9 x 6
# var SubjectID A R S W
# * <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 V1 2 0.00000000 0.00000000 0.25717780 0.28602670
# 2 V1 3 0.00000000 0.27548480 0.00000000 0.27029820
# 3 V1 4 0.27483300 0.00000000 0.27921990 0.00000000
# 4 V2 2 0.00000000 0.00000000 -0.02328523 -0.01316336
# 5 V2 3 0.00000000 -0.02605042 0.00000000 -0.03261387
# 6 V2 4 -0.02784779 0.00000000 -0.01862040 0.00000000
# 7 V3 2 0.00000000 0.00000000 -0.01465376 -0.11908252
# 8 V3 3 0.00000000 -0.11815167 0.00000000 -0.11752018
# 9 V3 4 -0.12952716 0.00000000 -0.11390197 0.00000000