我有一个包含许多列的数据,一部分可以在下面找到
df<- structure(list(Mine = structure(c(2L, 3L, 1L, 2L, 3L, 1L, 3L,
1L, 1L, 3L, 1L, 1L, 2L, 2L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 3L,
1L), .Label = c("IsMineBro", "IsMineBroCanMeriMerate", "None"
), class = "factor"), IMers = c(103L, 123L, 123L, 123L, 162L,
170L, 170L, 284L, 308L, 320L, 444L, 558L, 801L, 801L, 814L, 814L,
1009L, 1009L, 1015L, 1032L, 1032L, 1032L, 1032L, 1122L), namet = structure(c(2L,
1L, 24L, 13L, 16L, 10L, 7L, 9L, 3L, 19L, 15L, 4L, 11L, 14L, 8L,
12L, 21L, 6L, 17L, 5L, 20L, 23L, 22L, 18L), .Label = c("A0A0J9YU05",
"Bir22227", "Bir50516-1", "Bir50518", "Bir60930", "Bir60931",
"Bir61078", "Bir62523", "Bir62814", "Bir70315", "Bir71V06", "Bir7TBirE2",
"Bir80ZI9", "Bir810K5", "Bir8BH43", "Bir921J0", "Bir99KC8", "Bir9Z1G3",
"F2Z471", "G3UX26", "J3BirMG3", "Mer3YUN8", "Mer3YZT5", "O88342"
), class = "factor"), data1 = c(59.2, 10.7, 10.7, 10.7, 52.3,
16.7, 16.7, 40.5, 32.2, 116.6, 120.6, 35.6, 23.3, 23.3, 66.3,
66.3, 50, 50, 132.3, 102.3, 102.3, 102.3, 102.3, 11), data2 = c(70.7,
13.3, 13.3, 13.3, 55.8, 21.1, 21.1, 42.5, 28.6, 124.9, 104.9,
32.1, 25.3, 25.3, 79.3, 79.3, 55.5, 55.5, 164, 20, 20, 20, 20,
10), data3 = c(59.5, 15.8, 15.8, 15.8, 66.5, 14.9, 14.9, 28.9,
26.2, 117.6, 117.6, 33.7, 23.8, 23.8, 81.7, 81.7, 44.1, 44.1,
159.3, 159.3, 159.3, 159.3, 159.3, 20)), .Names = c("Mine", "IMers",
"namet", "data1", "data2", "data3"), class = "data.frame", row.names = c(NA,
-24L))
我想把那些有类似“Imers”的人放在一起,然后把它们分开; 。然后在另一栏中我想把他们相应的“我的”
由于其他列在所有类似的IMer中都是相同的,因此我将仅根据Unique Imers的IsMine bro表示其余列。
预期输出就像这样
output <- structure(list(ID = structure(c(1L, 15L, 11L, 8L, 7L, 2L, 14L,
10L, 3L, 9L, 6L, 5L, 12L, 4L, 13L), .Label = c("Bir22227", "Bir50516-1",
"Bir50518", "Bir60930;G3UX26;Mer3YZT5;Mer3YUN8", "Bir60931;J3BirMG3",
"Bir62523;Bir7TBirE2", "Bir62814", "Bir70315;Bir61078", "Bir71V06;Bir810K5",
"Bir8BH43", "Bir921J0", "Bir99KC8", "Bir9Z1G3", "F2Z471", "O88342;Bir80ZI9;A0A0J9YU05"
), class = "factor"), Lebel = structure(c(2L, 3L, 6L, 4L, 1L,
1L, 6L, 1L, 1L, 4L, 4L, 4L, 1L, 5L, 1L), .Label = c("IsMineBro",
"IsMineBroCanMeriMerate", "IsMineBro;IsMineBroCanMeriMerate;None",
"IsMineBro;None", "IsMineBro;None;None;None", "None"), class = "factor"),
data1 = c(59.2, 10.7, 52.3, 16.7, 40.5, 32.2, 116.6, 120.6,
35.6, 23.3, 66.3, 50, 132.3, 102.3, 11), data2 = c(70.7,
13.3, 55.8, 21.1, 42.5, 28.6, 124.9, 104.9, 32.1, 25.3, 79.3,
55.5, 164, 20, 10), data3 = c(59.5, 15.8, 66.5, 14.9, 28.9,
26.2, 117.6, 117.6, 33.7, 23.8, 81.7, 44.1, 159.3, 159.3,
20)), .Names = c("ID", "Lebel", "data1", "data2", "data3"
), class = "data.frame", row.names = c(NA, -15L))
让我举一个例子,这样就很容易理解问题
让我们看看df
的第一行
我看看IMers,我看它是103并且它是唯一的(没有任何其他103)所以我保留了我在输出中显示的行
我看第二行,我看到有三个123.然后我查看Mine
列,我看到IsMineBro,IsMineBroCanMeriMerate和None。
我将在另一栏中按顺序获取名称
it becomes O88342; Bir80ZI9; A0A0J9YU05
我将相应的Mine以相同的方式放在另一列中
IsMineBro;IsMineBroCanMeriMerate; None
data1,data2等只获取IsMineBro
答案 0 :(得分:1)
使用dplyr
我们可以获得解决方案:
library(dplyr)
df %>%
arrange(Mine) %>%
group_by(IMers) %>%
summarise(ID = paste(namet, collapse = ';'),
Lebel = paste(Mine, collapse = ';'),
data1 = max(data1),
data2 = max(data2),
data3 = max(data3))
# A tibble: 15 x 6
IMers ID Lebel data1 data2 data3
<int> <chr> <chr> <dbl> <dbl> <dbl>
1 103 Bir22227 IsMineBroCanMeriMerate 59.2 70.7 59.5
2 123 O88342;Bir80ZI9;A0A0J9YU05 IsMineBro;IsMineBroCanMeriMerate;None 10.7 13.3 15.8
3 162 Bir921J0 None 52.3 55.8 66.5
4 170 Bir70315;Bir61078 IsMineBro;None 16.7 21.1 14.9
5 284 Bir62814 IsMineBro 40.5 42.5 28.9
6 308 Bir50516-1 IsMineBro 32.2 28.6 26.2
7 320 F2Z471 None 116.6 124.9 117.6
8 444 Bir8BH43 IsMineBro 120.6 104.9 117.6
9 558 Bir50518 IsMineBro 35.6 32.1 33.7
10 801 Bir71V06;Bir810K5 IsMineBroCanMeriMerate;IsMineBroCanMeriMerate 23.3 25.3 23.8
11 814 Bir62523;Bir7TBirE2 IsMineBro;None 66.3 79.3 81.7
12 1009 Bir60931;J3BirMG3 IsMineBro;None 50.0 55.5 44.1
13 1015 Bir99KC8 IsMineBro 132.3 164.0 159.3
14 1032 Bir60930;G3UX26;Mer3YZT5;Mer3YUN8 IsMineBro;None;None;None 102.3 20.0 159.3
15 1122 Bir9Z1G3 IsMineBro 11.0 10.0 20.0