我有两个具有不同主键的单独表,我需要将它们连接在一起。一个表提供结果,另一个表提供所涉及人员的ID。由于存在多个结果,并且可能有多个人具有相似的结果,因此我的表联接未正确同步。我可能在这里错过了一步,没有在逻辑上思考问题,但是任何建议都将不胜感激。
我最初尝试使用发现的此功能连接表:
rbind.all.columns <- function(x, y) {
x.diff <- setdiff(colnames(x), colnames(y))
y.diff <- setdiff(colnames(y), colnames(x))
x[, c(as.character(y.diff))] <- NA
y[, c(as.character(x.diff))] <- NA
return(rbind(x, y))
}
但是,它所做的一切都为我提供了结果和ID列表。
表1:
EVENT ID
145754 1738
145754 1756
145639 1738
145639 1756
df1 <- structure(list(EVENT = c(145754L, 145754L, 145639L, 145639L), ID = c(1738L, 1756L, 1738L, 1756L)), class = "data.frame", row.names = c(NA, -4L))
表2:
ENTRY EVENT RESULT
DEL 145754 Was given xxx med
INS 145754
DEL 145639 Reported stomachache
INS 145639
df2 <- structure(list(ENTRY = c("DEL", "INS", "DEL", "INS"), EVENT = c(145754L, 145754L, 145639L, 145639L), RESULT = c("Was given xxx med", "", "Reported stomachache", "")), class = "data.frame", row.names = c(NA, -4L))
所需表:
ID EVENT RESULT
1738, 1756 145754 Was given xxx med
1738, 1756 145639 Reported stomachache
答案 0 :(得分:3)
我们需要做的是使用paste
将同一事件的ID连接到一个逗号分隔的列表中:
library(tidyverse)
df1_concat <- df1 %>%
group_by(EVENT) %>%
summarise(IDs = paste(ID, collapse = ', '))
# A tibble: 2 x 2
EVENT IDs
<int> <chr>
1 145639 1738, 1756
2 145754 1738, 1756
然后,我们可以在“事件”列中进行*_join
:
left_join(df2, df1_concat, by = 'EVENT')
ENTRY EVENT RESULT IDs
1 DEL 145754 Was given xxx med 1738, 1756
2 INS 145754 1738, 1756
3 DEL 145639 Reported stomachache 1738, 1756
4 INS 145639 1738, 1756
我不清楚为什么要删除ENTRY == 'INS'
所在的行,但是根据此处的逻辑,有很多方法可以将其过滤掉。我在下面显示2:
# Remove rows where ENTRY == 'DEL'
left_join(df1_concat, df2, by = 'EVENT') %>%
filter(ENTRY == 'DEL')
# A tibble: 2 x 4
EVENT IDs ENTRY RESULT
<int> <chr> <fct> <fct>
1 145639 1738, 1756 DEL Reported stomachache
2 145754 1738, 1756 DEL Was given xxx med
# Remove rows with no value for RESULT
left_join(df1_concat, df2, by = 'EVENT') %>%
filter(RESULT != '')
# A tibble: 2 x 4
EVENT IDs ENTRY RESULT
<int> <chr> <fct> <fct>
1 145639 1738, 1756 DEL Reported stomachache
2 145754 1738, 1756 DEL Was given xxx med
答案 1 :(得分:0)
在基数R中,我们实际上可以使用aggregate
,merge
和toString
(用于ID)的单线来完成此操作。
d <- aggregate(ID ~ EVENT + RESULT, merge(df1, df2[which(df2$RESULT != ""), ]), toString)
d
# EVENT RESULT ID
# 1 145639 Reported stomachache 1738, 1756
# 2 145754 Was given xxx med 1738, 1756