我试图在R中格式化我的数据,以便我可以正确地将它用于不同的一般线性模型。
数据是这样的:
> str(data)
'data.frame': 1978 obs. of 7 variables:
$ country : Factor w/ 22 levels "AT","BE","CH",..: 8 8 8 8 8 8 8 8 8 8 ...
$ age : num 65 77 36 28 23 15 75 20 44 73 ...
$ gender : Factor w/ 2 levels "male","female": 2 1 1 1 2 2 1 2 2 1 ...
$ education_level : Factor w/ 6 levels "less_than_lower_sec",..: 5 1 3 5 5 2 1 3 3 5 ...
$ good_citizen_importance: Factor w/ 11 levels "00","01","02",..: 11 9 9 9 10 10 7 10 10 9 ...
$ trade : Factor w/ 7 levels "none_apply","member",..: 2 4 4 2 2 4 4 4 2 4 ...
$ relig : Factor w/ 7 levels "none_apply","member",..: 2 2 4 4 4 4 2 5 4 4 ...
数据本身的片段:
> head(data)
country age gender education_level good_citizen_importance trade relig
13711 FI 65 female tertiary 10 member member
13712 FI 77 male less_than_lower_sec 08 donated member
13713 FI 36 male upper_sec 08 donated donated
13714 FI 28 male tertiary 08 member donated
13715 FI 23 female tertiary 09 member donated
13716 FI 15 female lower_sec 09 donated donated
我已经设法做了这种频率计数,这意味着我几乎就在那里。但我想得到所有因素和相关数量" good_citizen_importance"变量到列。
> counts <- count(data, c("good_citizen_importance", "trade", "relig", "gender"))
> head(counts)
good_citizen_importance trade relig gender freq
1 00 donated member male 1
2 00 donated donated male 1
3 01 member donated female 1
4 01 donated donated male 2
5 01 donated donated female 1
6 02 member member female 1
这就是我想要的数据:
> head(counts)
trade relig gender "00" "01" "02" ...
1 donated member male 1 5 7 ...
2 donated donated male 12 2 3 ...
3 member donated female 11 3 1 ...
4 donated donated male 25 1 4 ...
5 donated donated female 12 1 1 ...
6 member member female 11 1 1 ...
因此,我希望将一个变量的所有因子的因子频率与其他变量的组合结合起来。换句话说,&#34; good_citizen_importance&#34;的所有11个因子的频率列。变量
我确定这不是一个非常难的问题,但我已经打了好几个小时了,我觉得我现在已经用尽了我的R和Google技能。
答案 0 :(得分:1)
这可以通过重塑数据来实现。在基础R中,可以使用函数reshape
,但语法很笨拙(我以前经常使用它,我必须每次查找语法)。 spread
套件中的tidyverse
更好的解决方案(具体来说,它位于tidyr
包中:
library(tidyr) # or library(tidyverse)
counts_wide <- counts %>%
spread(good_citizen_importance, freq, fill = 0)
如果您不熟悉管道运算符(%>%
),它将获取前一个函数的输出并将其设置为下一个函数的第一个参数。它用于通过删除大量嵌套函数使代码更容易阅读。