我正在尝试使用tidyR重塑数据帧.Below是数据帧:
data <- data.frame(class_name=c("date","date","educational","qualif","date","date", "educational","qualif"),
text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY",
"Master of Science, Computer Science","1996","2000",
"MAHARASHTRA INSTITUTE OF TECHNOLOGY",
"Bachelor of Science, Mechanical Engineering"))
我希望数据看起来如下图所示:
答案 0 :(得分:3)
这是使用tidyverse
的想法。我们基本上每4行分组并传播。但是,我们需要首先使class_name
中的名称唯一,即
library(tidyverse)
data %>%
group_by(grp = rep(seq(n()/4), each = 4)) %>%
mutate(class_name = make.unique(as.character(class_name))) %>%
spread(class_name, text_val) %>%
ungroup() %>%
select(educational, qualif, date, date.1)
由此给出,
# A tibble: 2 x 4 educational qualif date date.1 * <fctr> <fctr> <fctr> <fctr> 1 ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science 2000 2003 2 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996 2000
答案 1 :(得分:1)
使用reshape
的另一种解决方案(不如Sotos&#39;解决方案优雅):
data <- data.frame(class_name=c("date","date","educational","qualif","date","date", "educational","qualif"),
text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY",
"Master of Science, Computer Science","1996","2000",
"MAHARASHTRA INSTITUTE OF TECHNOLOGY",
"Bachelor of Science, Mechanical Engineering"))
nrec <- 4
data$id <- rep(1:2, each=nrec)
data$time <- rep(1:4, nrow(data)/nrec)
df <- reshape(data, v.names="text_val", idvar="id", direction="wide")[,-1]
names(df) <- c("id","date1","date2","educational","qualif")
df
# id date1 date2 educational qualif
# 1 1 2000 2003 ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science
# 5 2 1996 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering
答案 2 :(得分:0)
为了完整起见,这里也是使用dcast()
包中的data.table
的解决方案:
library(data.table)
setDT(data)[, rn := .I + 3L][
, dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")]
rn date educational qualif 1: 1 2000, 2003 ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science 2: 2 1996, 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering
请注意,toString()
用作聚合函数,以便重复日期在一列中连接。这是因为OP的预期输出中的两个date
列共享相同的名称,这可能表示预期的输出仅用于显示,并且不需要进一步处理date
值。< / p>
如果列顺序很重要且不需要rn
,则可以美化输出以更好地匹配OP的预期结果:
lvl <- c("educational", "qualif", "date")
setDT(data)[, rn := .I + 3L][, class_name := factor(class_name, levels = lvl)][
, dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")][, rn := NULL][]
educational qualif date 1: ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science 2000, 2003 2: MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996, 2000