我有一个像这样的数据框
ID <- c("ID21","ID22","ID23","ID24")
STR_PL_CAN_EVOLVE_PROCESS <- c("CCP_A,CCP_B","CCQ_A,CCQ_B,CCQ_C","IOT_A,CCP_B","CCQ_B,IOT_B")
Average <- c(7.5,6.5,7.1,6.6)
STR_VD_CAN_MEASURE_PROCESS <- c("Length,Breadth","Breadth,Width","Height,Length,Width","Width,Length")
Passfail <- c("Pass","Pass","Fail","Fail")
df <- data.frame(ID,STR_PL_CAN_EVOLVE_PROCESS,Average,STR_VD_CAN_MEASURE_PROCESS,Passfail,stringsAsFactors=FALSE)
我正在尝试使用tidyverse将以“ process”结尾的列中的值分成几列,并以此方式进行。
library(tidyverse)
df1 <- df %>%
separate(STR_PL_CAN_EVOLVE_PROCESS,
paste0("ST_PL_CA_EV_PR","_Path",
seq(1:10)),
sep = ",") %>%
separate(STR_VD_CAN_MEASURE_PROCESS,
paste0("ST_VD_CA_ME_PR","_Path",
seq(1:10)),
sep = ",")
这可行,但是我在这里手动做很多事情(输入列名,新列名)。这是我想要实现的一些事情
STR_PL_CAN_EVOLVE_PROCESS
成为ST_PL_CA_EV_PR
我的所需输出是
ID ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 Average ST_VD_CA_ME_PR_Path1 ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3 Passfail
ID21 CCP_A CCP_B <NA> 7.5 Length Breadth <NA> Pass
ID22 CCQ_A CCQ_B CCQ_C 6.5 Breadth Width <NA> Pass
ID23 IOT_A CCP_B <NA> 7.1 Height Length Width Fail
ID24 CCQ_B IOT_B <NA> 6.6 Width Length <NA> Fail
我的实际数据集中大约有35列以“ PROCESS”结尾。有人可以指出我正确的方向吗?
答案 0 :(得分:1)
这里是cSplit
library(splitstackshape)
library(dplyr)
df %>%
cSplit(c("STR_PL_CAN_EVOLVE_PROCESS", "STR_VD_CAN_MEASURE_PROCESS"),
',', drop = TRUE)
#ID Average Passfail STR_PL_CAN_EVOLVE_PROCESS_1 STR_PL_CAN_EVOLVE_PROCESS_2 STR_PL_CAN_EVOLVE_PROCESS_3
#1: ID21 7.5 Pass CCP_A CCP_B <NA>
#2: ID22 6.5 Pass CCQ_A CCQ_B CCQ_C
#3: ID23 7.1 Fail IOT_A CCP_B <NA>
#4: ID24 6.6 Fail CCQ_B IOT_B <NA>
# STR_VD_CAN_MEASURE_PROCESS_1 STR_VD_CAN_MEASURE_PROCESS_2 STR_VD_CAN_MEASURE_PROCESS_3
#1: Length Breadth <NA>
#2: Breadth Width <NA>
#3: Height Length Width
#4: Width Length <NA>
答案 1 :(得分:0)
纯净的tidyverse版本。步骤很多,但我相信它会成就您的追求。
如何?
df %>%
# Grabs all columns ending with process
gather(key,val, ends_with("PROCESS")) %>%
# Separate all now previous column names by "_"
separate(key, paste0("Pat",1:5)) %>%
# Mutate all columns starting with "Pat" to just two chars
mutate_at(vars(starts_with("Pat")), substr, 1, 2) %>%
# Separat cell values on comma
separate(val, paste0("Path",1:3)) %>%
# Gather all Path columns into a key and value pair
gather(Path, val,starts_with("Path")) %>%
# Unite all columns starting with "Pat" into one
unite(key, starts_with("Pat")) %>% na.omit() %>%
# Spread the data using "key" as columns, and "val" as vell values
spread(key,val)
ID Average Passfail ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 ST_VD_CA_ME_PR_Path1
1 ID21 7.5 Pass CCP A CCP Length
2 ID22 6.5 Pass CCQ A CCQ Breadth
3 ID23 7.1 Fail IOT A CCP Height
4 ID24 6.6 Fail CCQ B IOT Width
ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3
1 Breadth <NA>
2 Width <NA>
3 Length Width
4 Length <NA>