我正在尝试获取下表的虚拟变量:
df1 <- structure(list(Value1 = c(9.330154398, 32.43881489, 54.77178387, 54.77178387),
Value2 = c(1, 2, 3, 8),
var1 = c("HomeATL", "AwaySDN", "AwayLAN", "AwayLAN"),
var2 = c("AwayHOU", "HomeATL", "HomeATL", "HomeATL"),
var3 = c("HomeEast", "HomeWest", "AwayEast", "AwayWest"),
var3values = c(1,2,3,4),
var4 = c("AwayWest", "AwayWest", "HomeSame", "HomeEast"),
var4values = c(5,6,7,8)),
class = "data.frame", row.names = c(NA,-4L))
结果应如下所示:
Value1 Value2 HomeEast HomeWest AwayEast AwayWest HomeSame HomeATL AwayHOU AwaySDN AwayLAN
9.330154398 1 1 0 0 5 0 1 1 0 0
-32.43881489 2 0 2 0 6 0 1 0 1 0
54.77178387 3 0 0 3 0 7 1 0 0 1
54.77178387 8 8 0 0 4 0 1 0 0 1
我已经问过类似的问题,而我使用的方法是:
library(tidyverse)
rownames_to_column(df1, 'rn') %>%
gather(key, val, var1:var4) %>%
count(rn, val) %>%
spread(val, n, fill = 0) %>%
select(-rn) %>%
bind_cols(df1[1:2], .)
但是,它返回带有1或0的虚拟值,而不是某些预定义列的值。
我该怎么办?
答案 0 :(得分:0)
这就是我要做的
one <- df1 %>% select(var1:var2) %>% rownames_to_column('rn') %>%
gather(key, val, var1:var2) %>% mutate(key = 1) %>%
spread(val, key, fill = 0) %>% select(-rn)
two <- df1 %>% select(var3:var3values) %>% rownames_to_column('rn') %>% rename(var =
var3, values = var3values) %>%
bind_rows(df1 %>%
select(var4:var4values) %>%
rownames_to_column('rn') %>%
rename(var = var4, values = var4values)) %>%
spread(var, values, fill = 0) %>%
select(-rn)
three <- df1 %>% select(1,2)
cbind(three, two, one)
答案 1 :(得分:0)
一种选择是gather
列名以{var}开头,后跟一个或多个数字(matches
)的列\\d+
到结尾({{1} })的字符串,按行号'val'列分组,根据$
中指定的条件创建'n',即,如果'key'为'var3',则获得相应的' var3values”,或者如果是'var4',则获取'var4values',如果两个都不是,则获取频率计数(case_when
),n()
,将其转换为'wide'格式,仅保留感兴趣的列
spread