我正在尝试重新格式化具有四列的数据框。但是,其中一列(dem_profile_description)有大约20个我想要更改为列的变量。我下载了重塑包。
我的数据框的前几行是:
dem_profile_field dem_profile_description dem_profile_data Community
dpsf0010042 Female 10 to 14 years(1) 4 Gnar
dpsf0010043 Female 15 to 19 years(2) 20 Yoke
dpsf0010044 Female 20 to 24 years(3) 22 Law
dpsf0010045 Female 25 to 29 years(4) 23 Law
dpsf0010046 Female 30 to 34 years(5) 24 Ark
dpsf0010047 Female 35 to 39 years(6) 30 Riverland
我想要这个:
dem_profile_field Community (1) (2) (3) (4) (5) (6)
dpsf0010042 Gnar 4
dpsf0010043 Yoke 20
dpsf0010044 Law 5 5
dpsf0010046 Ark 24
dpsf0010047 Riverland 30
我的代码是:
library(reshape2)
census3 <- dcast(census2, "dem_profile_field" + "Community" ~
"dem_profile_description", value.var = "dem_profile_data" )
但我最终得到了这个:
dem_profile_field Community dem_profile_description
1 Community 2
答案 0 :(得分:2)
你基本上就在那里 - 你只需要在formula
dcast
的{{1}}来电中排除引用(value.var
仍然需要它们):
census3 <- dcast(census2, dem_profile_field + Community ~
dem_profile_description, value.var = "dem_profile_data" )
要获得您想要的名字,您也可以这样做:
names_to_replace <- grepl("(\\(.*\\))", names(census3))
names(census3)[names_to_replace] <- str_extract(names(census3)[names_to_replace], "\\(.*\\)")
答案 1 :(得分:0)
如果您刚刚开始使用新的数据包转置,您可能需要查看tidyr。语法更直接,并且与“tidyverse”中的其他数据操作包很好地结合。
你的例子会像这样工作
library(tidyr)
df <- data.frame(dem_profile_field =
c("dpsf0010042",
"dpsf0010043",
"dpsf0010044",
"dpsf0010045",
"dpsf0010046",
"dpsf0010047"),
dem_profile_description =
c("Female 10 to 14 years(1)",
"Female 15 to 19 years(2)",
"Female 20 to 24 years(3)",
"Female 25 to 29 years(4)",
"Female 30 to 34 years(5)",
"Female 35 to 39 years(6)"),
dem_profile_data =
c(4,
20,
22,
23,
24,
30),
Community =
c("Gnar",
"Yoke",
"Law",
"Law",
"Ark",
"Riverland"),
stringsAsFactors = FALSE)
df_transposed <- df %>%
spread(dem_profile_description, dem_profile_data)