我在R中跟随data.frame
:
Introvert Extrovert Nature Presence
0 -1 3 Yes
1 3 2 No
2 5 4 Yes
1 -2 0 No
现在,我想以下列方式编写响应代码:
3,4 <- Positives
0,1,2 <- Neutral
< 0 <- Negatives
然后在Positives
和Negatives
之间获得Neutrals
,Yes
和No
的计数。
我有20列反应,如上所述。我怎么能用简单的R代码呢?
我为每列做ifelse
然后group_by
。
我的样本所需数据框将是:
Introvert_Positive Introvert_Negative Introvert_Neutral
Yes 0 0 2
No 0 0 2
答案 0 :(得分:2)
这个怎么样?
library(tidyverse);
df %>%
gather(key, value, -Presence) %>%
mutate(bin = cut(
value,
breaks = c(-Inf, -1, 2.5, Inf),
labels = c("Negatives", "Neutral", "Positives"))) %>%
select(-value) %>%
unite(col, key, bin, sep = "_") %>%
count(Presence, col) %>%
spread(col, n)
## A tibble: 2 x 6
# Presence Extrovert_Negativ… Extrovert_Positi… Introvert_Neutr… Nature_Neutral
# <fct> <int> <int> <int> <int>
#1 No 1 1 2 2
#2 Yes 1 1 2 NA
## ... with 1 more variable: Nature_Positives <int>
说明:我们使用cut
和labels
重新编码回复;其余的问题是gather
,unite
相关列,count
出现次数和spread
从长到宽。
df <- read.table(text =
"Introvert Extrovert Nature Presence
0 -1 3 Yes
1 3 2 No
2 5 4 Yes
1 -2 0 No", header = T)
答案 1 :(得分:1)
为了娱乐/练习,我使用@MauritsEvers答案的工作流程创建了一个data.table方法。 它比dplyr方法快约60%(见基准)
您可以跳过列{key}和bin的unite
,因为在使用dcast时可以在与投射相同的步骤中处理。
df %>%
setDT() %>%
melt( id = 4 ) %>%
.[, bin := cut( value,
breaks = c(-Inf, -1, 2.5, Inf),
labels = c("Negatives", "Neutral", "Positives") )] %>%
.[, value := NULL] %>%
.[, .N, by = c("Presence", "variable", "bin")] %>%
dcast( Presence ~ variable + bin, value.var = "N")
Presence Introvert_Neutral Extrovert_Negatives Extrovert_Positives Nature_Neutral Nature_Positives
1: No 2 1 1 2 NA
2: Yes 2 1 1 NA 2
library(microbenchmark)
microbenchmark(
dplyr = {
df %>%
gather(key, value, -Presence) %>%
mutate(bin = cut(
value,
breaks = c(-Inf, -1, 2.5, Inf),
labels = c("Negatives", "Neutral", "Positives"))) %>%
select(-value) %>%
unite(col, key, bin, sep = "_") %>%
count(Presence, col) %>%
spread(col, n)
},
data.table = {
df %>%
setDT() %>%
melt( id = 4 ) %>%
.[, bin := cut( value,
breaks = c(-Inf, -1, 2.5, Inf),
labels = c("Negatives", "Neutral", "Positives") )] %>%
.[, value := NULL] %>%
.[, .N, by = c("Presence", "variable", "bin")] %>%
dcast( Presence ~ variable + bin, value.var = "N")
},
times = 1000
)
Unit: milliseconds
expr min lq mean median uq max neval
dplyr 9.636224 10.083903 10.59597 10.267371 10.458524 26.38649 1000
data.table 3.458208 3.647401 3.92219 3.835239 3.949568 15.05596 1000