我在下面有以下df:
name name..2 IGD
1 yaaA recF 16
2 recF yaaB 18
3 yaaD yaaE 22
4 dck dgk -3
5 dnaX yaaK 24
6 yaaK recR 15
7 recR yaaL 18
8 xpaC yaaN 19
9 yaaO tmk -3
10 yaaQ yaaR 13
11 yaaR holB 12
12 holB yaaT 3
13 yaaT yabA 15
14 yabB yazA -13
15 yazA yabC -25
我正在尝试找到一种方法,将name和name..2中的值粘贴在一起,其中name..2与下一行中的name匹配,然后将其放入新的df中,该外观应如下所示:
1 yaaA recF
2 yaaD
3 dck
4 dnaX yaaK recR
5 xpaC
6 yaaO
7 yaaQ yaaR holB yaaT
8 yabB yazA
是否可以使用r函数?我曾尝试搜索SO,但尚未找到解决此问题的解决方案。预先感谢您的帮助。
答案 0 :(得分:3)
这里的逻辑与@ Wen-Ben相似,是一种dplyr
的实现方式
library(dplyr)
df %>%
group_by(group = cumsum(name != lag(name2, default = TRUE))) %>%
summarise(name = toString(name))
# group name
# <int> <chr>
#1 1 yaaA, recF
#2 2 yaaD
#3 3 dck
#4 4 dnaX, yaaK, recR
#5 5 xpaC
#6 6 yaaO
#7 7 yaaQ, yaaR, holB, yaaT
#8 8 yabB, yazA
主要思想是创建一个每次name != name2
都会递增的分组变量。
答案 1 :(得分:2)
在Base R中,我们使用tail
head
和cumsum
创建组密钥,然后使用aggregate
df$id=cumsum(c(TRUE, tail(df$name,-1) != head(df$name2,-1)))
output=aggregate(name ~ id, data = df, toString)
output
id name
1 1 yaaA, recF
2 2 yaaD
3 3 dck
4 4 dnaX, yaaK, recR
5 5 xpaC
6 6 yaaO
7 7 yaaQ, yaaR, holB, yaaT
8 8 yabB, yazA
答案 2 :(得分:2)
这是在clusters
内标识igraph
的另一种选择
library(igraph)
library(tidyverse)
df %>%
select(-IGD) %>%
graph_from_data_frame() %>%
clusters() %>%
magrittr::extract2(1) %>%
split(., .) %>%
map_dfr(~tibble(x = toString(names(.x)[-length(.x)])))
## A tibble: 8 x 1
# x
# <chr>
#1 yaaA, recF
#2 yaaD
#3 dck
#4 dnaX, yaaK, recR
#5 xpaC
#6 yaaO
#7 yaaQ, yaaR, holB, yaaT
#8 yabB, yazA
这个想法是从igraph
构造一个df[c("name", "name..2")]
,然后识别连接节点的集群。簇就是组,我们要做的就是删除最后一个元素(节点)。
df <- read.table(text =
" name name..2 IGD
1 yaaA recF 16
2 recF yaaB 18
3 yaaD yaaE 22
4 dck dgk -3
5 dnaX yaaK 24
6 yaaK recR 15
7 recR yaaL 18
8 xpaC yaaN 19
9 yaaO tmk -3
10 yaaQ yaaR 13
11 yaaR holB 12
12 holB yaaT 3
13 yaaT yabA 15
14 yabB yazA -13
15 yazA yabC -25", header = T)
答案 3 :(得分:0)
我们也可以在data.table
中进行
library(data.table)
setDT(df)[, .(name = toString(name)),
.(group = cumsum(name != shift(name2, fill = TRUE)))]
# group name
#1: 1 yaaA, recF
#2: 2 yaaD
#3: 3 dck
#4: 4 dnaX, yaaK, recR
#5: 5 xpaC
#6: 6 yaaO
#7: 7 yaaQ, yaaR, holB, yaaT
#8: 8 yabB, yazA
df <- structure(list(name = c("yaaA", "recF", "yaaD", "dck", "dnaX",
"yaaK", "recR", "xpaC", "yaaO", "yaaQ", "yaaR", "holB", "yaaT",
"yabB", "yazA"), name2 = c("recF", "yaaB", "yaaE", "dgk", "yaaK",
"recR", "yaaL", "yaaN", "tmk", "yaaR", "holB", "yaaT", "yabA",
"yazA", "yabC"), IGD = c(16L, 18L, 22L, -3L, 24L, 15L, 18L, 19L,
-3L, 13L, 12L, 3L, 15L, -13L, -25L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"))