我仍然无法使用pivot_longer将宽的多列收集到多个长列中,而不是逐列地收集(跟着使用pivot_longer将宽的列收集到多个长列中)。
例如,列hf_1,hf_2,hf_3,hf_4,hf_5,hf_6需要旋转为2列(hf_com-该列,其值分别来自宽hf列的1,2,3,4,5,6) (hf_com_freq-该列的值为1)。
对于列ac_1,ac_2,ac_3,ac_4,ac_5,ac_6,同样需要发生。这些列需要分为2列(ac_com-该列具有来自宽ac列的值1,2,3,4,5,6)和(ac_com_freq-该列具有值1)。
我尝试查看:
Gather multiple sets of columns
和:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
以及来自谁数据集中的示例:
https://tidyr.tidyverse.org/articles/pivot.html
但是我无法将值放入所需的多个较长列中。
这是玩具数据集的输入数据:
df1 <- tribble(
~"np_id", ~"np_city_size", ~"cc_hf_1", ~"cc_hf_2", ~"cc_hf_3", ~"cc_hf_4", ~"cc_hf_5", ~"cc_hf_6", ~"cc_ac_1", ~"cc_ac_2", ~"cc_ac_3", ~"cc_ac_4", ~"cc_ac_5", ~"cc_ac_6",
"81", "village", NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA,
"82", "village", 1L, NA, NA, NA, 1L, NA, NA, NA, NA, 1L, NA, NA,
"83", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
"85", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
"87", "more than 500k inhabitants", NA, 1L, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA,
"89", "village", 1L, NA, NA, 1L, NA, NA, 1L, NA, NA, NA, NA, NA,
"90", "village", 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA,
"91", "village", 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA,
"92", "village", NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L
)
这是我目前的代码:
df_longer <- df1 %>% pivot_longer(
cols = -(starts_with("np_")),
names_to = c("hf_com", "ac_com"),
names_pattern = "cc_?(.*)_(.*)",
values_to = c("hf_com_freq", "ac_com_freq")
)
但是,我知道我需要提取列标题中的最后一个字符(例如hf _ 1
中的1
{hf _ 2
中的2
)并传递它作为每列的.value,但是我在使用正则表达式和ivot_longer参数(例如names_patterns)时遇到问题。我觉得我已经很接近解决方案了,但是看不到树木茂密的森林!!!!
实际结果如下:
df_longer <- structure(list(np_id = c("81", "81", "81", "81", "81", "81"),
np_city_size = c("village", "village", "village", "village",
"village", "village"), hf_com = c("hf", "hf", "hf", "hf",
"hf", "hf"), ac_com = c("1", "2", "3", "4", "5", "6"), hf_com_freq = c(NA,
NA, 1L, NA, NA, NA), ac_com_freq = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
以下是预期结果:
df_longer <- structure(list(np_id = c("81", "81", "81", "81", "81", "81"),
np_city_size = c("village", "village", "village", "village",
"village", "village"), hf_com = c("1", "2", "3", "4",
"5", "6"), ac_com = c("1", "2", "3", "4", "5", "6"), hf_com_freq = c(NA,
NA, 1L, NA, NA, NA), ac_com_freq = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))