Question

我的数据框格式如下

   c1  c2  c3
1   A   1   D
2   A   2   D
3   A   3   D
4   X   4   D
5   A   5   D
6   X   6   D
7   X   7   D
8   A   8   D

我需要这样做，以便c1中包含“X”的所有行合并到上面一行的c3中，如下所示

   c1  c2        c3
1   A   1         D
2   A   2         D
3   A   3      DX4D
4   A   5   DX6DX7D
5   A   8         D

有什么想法吗？

Answer 1

由于您未提供数据结构，因此不清楚c3是因子还是字符串。以防万一，我在处理之前将其转换为字符串。

printf("guide\nWould you like to run this program again (y/n):");

Answer 2

df <- read.table(text = "    c1  c2  c3
1   A   1   D
2   A   2   D
3   A   3   D
4   X   4   D
5   A   5   D
6   X   6   D
7   X   7   D
8   A   8   D", stringsAsFactors = FALSE)

desired_output <- read.table(text = "    c1  c2  c3
1   A   1   D
2   A   2   D
3   A   3   DX4D
4   A   5   DX6DX7D
5   A   8   D", stringsAsFactors = FALSE)
rownames(desired_output) <- NULL

library(dplyr)
output <- 
df %>% 
  mutate(to_paste = ifelse(c1 == "X", paste0(c1, c2, c3), c3)) %>% 
  group_by(grp = cumsum(c1 == "A")) %>% 
  summarise(c1 = first(c1), c2 = first(c2), c3 = paste0(to_paste, collapse = "")) %>% 
  select(- grp) %>%
  as.data.frame()

identical(output, desired_output) 
# [1] TRUE

Answer 3

虽然已经回答，但我想逐步解释我的方法：

为此我使用了不同的数据：

# c1  c2  c3
#  A   1   D
#  X   2   D
#  A   3   D
#  X   4   D
#  A   5   D
#  X   6   D
#  X   7   D
#  X   8   D

y = which(df1$c1=="X")      # which rows are having "X"
z = cumsum(c(0,diff(y))!=1) # which of those are consecutive

# for the consecutive rows, paste all the columns data together
str <- sapply(unique(z), function(i) paste0(unlist(t(df1[y[z == i], ])),collapse = ""))

# which are the rows just occuring before these X's
z = unique(y[z])-1

# substitute the "pasted together" string at the rows just prior to X's 
df1$c3[z] = paste(df1$c3[unique(y[z])-1],str,sep="")

# subset to have only non-X's rows
df1[df1$c1!="X",]

#   c1 c2         c3
#1:  A  1       DX2D
#2:  A  3       DX4D
#3:  A  5 DX6DX7DX8D

将某些数据从第一列移动到上一行的最后一列

3 个答案: