我的数据框格式如下
c1 c2 c3
1 A 1 D
2 A 2 D
3 A 3 D
4 X 4 D
5 A 5 D
6 X 6 D
7 X 7 D
8 A 8 D
我需要这样做,以便c1
中包含“X”的所有行合并到上面一行的c3
中,如下所示
c1 c2 c3
1 A 1 D
2 A 2 D
3 A 3 DX4D
4 A 5 DX6DX7D
5 A 8 D
有什么想法吗?
答案 0 :(得分:1)
由于您未提供数据结构,因此不清楚c3是因子还是字符串。以防万一,我在处理之前将其转换为字符串。
printf("guide\nWould you like to run this program again (y/n):");
答案 1 :(得分:1)
df <- read.table(text = " c1 c2 c3
1 A 1 D
2 A 2 D
3 A 3 D
4 X 4 D
5 A 5 D
6 X 6 D
7 X 7 D
8 A 8 D", stringsAsFactors = FALSE)
desired_output <- read.table(text = " c1 c2 c3
1 A 1 D
2 A 2 D
3 A 3 DX4D
4 A 5 DX6DX7D
5 A 8 D", stringsAsFactors = FALSE)
rownames(desired_output) <- NULL
library(dplyr)
output <-
df %>%
mutate(to_paste = ifelse(c1 == "X", paste0(c1, c2, c3), c3)) %>%
group_by(grp = cumsum(c1 == "A")) %>%
summarise(c1 = first(c1), c2 = first(c2), c3 = paste0(to_paste, collapse = "")) %>%
select(- grp) %>%
as.data.frame()
identical(output, desired_output)
# [1] TRUE
答案 2 :(得分:1)
虽然已经回答,但我想逐步解释我的方法:
为此我使用了不同的数据:
# c1 c2 c3
# A 1 D
# X 2 D
# A 3 D
# X 4 D
# A 5 D
# X 6 D
# X 7 D
# X 8 D
y = which(df1$c1=="X") # which rows are having "X"
z = cumsum(c(0,diff(y))!=1) # which of those are consecutive
# for the consecutive rows, paste all the columns data together
str <- sapply(unique(z), function(i) paste0(unlist(t(df1[y[z == i], ])),collapse = ""))
# which are the rows just occuring before these X's
z = unique(y[z])-1
# substitute the "pasted together" string at the rows just prior to X's
df1$c3[z] = paste(df1$c3[unique(y[z])-1],str,sep="")
# subset to have only non-X's rows
df1[df1$c1!="X",]
# c1 c2 c3
#1: A 1 DX2D
#2: A 3 DX4D
#3: A 5 DX6DX7DX8D