我有一个正在使用dplyr软件包在R中进行编辑的数据集。我的代码是:
hiphop%>%
mutate( sex =
case_when(
sex == 1 ~ "female",
sex == 0 ~ "male"
)
)%>%
group_by(sex)%>%
summarise_at(vars(intl,vocal,classical,folk,rock,country,pop,alternative,hiphop,unclassifiable),funs(mean))%>%
pivot_longer(c(intl,vocal,classical,folk,rock,country,pop,alternative,hiphop,unclassifiable),names_to = "genre")%>%
spread(sex,value)%>%
mutate(
genredifference = abs(female-male)
)%>%
arrange(genredifference)%>%
top_n(3)
我在哪里获得此输出:
Selecting by genredifference
# A tibble: 3 x 4
genre female male genredifference
<chr> <dbl> <dbl> <dbl>
1 country 0.786 0.392 0.394
2 vocal 0.880 1.57 0.688
3 rock 1.93 3.06 1.13
我想获得相同的输出,但可以通过将pread()函数替换为pivot_wider()来实现(我相信这将是要使用的函数)。但是,我不知道该怎么做。
谢谢!
P.S:如果您有兴趣,这是我的数据集:
hiphop <- read_csv("https://www.dropbox.com/s/5d8fwxrj3jtua1z/hiphop.csv?dl=1")
答案 0 :(得分:1)
基于保管箱输入数据,某些步骤已经完成。我们可以利用select_helpers
使某些步骤更紧凑,即,如果我们有一定范围的列可供选择,请使用:
,与pivot_longer
类似,我们也可以指定列不为由-
选择。使用pivot_wider
时,请确保指定自变量(names_from
,values_from
,因为还有其他自变量,并且不指定自变量,它可以按出现的顺序匹配自变量>
library(dplyr)
library(tidyr)
hiphop %>%
group_by(sex)%>%
summarise_at(vars(intl:unclassifiable), mean) %>%
pivot_longer(cols = -sex) %>%
pivot_wider(names_from = sex, values_from = value) %>%
mutate(genredifference = abs(Female-Male))%>%
arrange(genredifference)%>%
top_n(3)
# A tibble: 3 x 4
# name Female Male genredifference
# <chr> <dbl> <dbl> <dbl>
#1 country 0.786 0.392 0.394
#2 vocal 0.880 1.57 0.688
#3 rock 1.93 3.06 1.13