我正在研究一个看起来像这样的数据框:
Df.1:
“Ind. Name” “Ind. ID” “Inst” “Inst. ID”
J. Smith 12345 A 532
K. Kapplan 12346 A 532
A. Lindt 12347 A 532
B. Johnson 12348 B 761
E. Pitt 12349 B 761
S. Mathews 12351 C 890
P. Rawles 12351 C 890
P. Right 12352 C 890
O. Stray 12353 C 890
我需要创建一个允许我得到类似结果的函数:
Df.Result:
“Ind. Name” “Ind.ID” “Linked Ind.” “Linked Ind.ID” “Inst” “Inst.ID”
J. Smith 12345 K. Kapplan 12346 A 352
J. Smith 12345 A. Lindt 12347 A 352
K. Kapplan 12346 J. Smith 12345 A 352
K. Kapplan 12346 A. Lindt 12347 A 352
A. Lindt 12347 J. Smith 12345 A 352
A. Lindt 12347 K. Kapplan 12346 A 352
B. Johnson 12348 E. Pitt 12349 B 761
E. Pitt 12349 B. Johnson 12348 B 761
S. Mathews 12351 P. Rawles 12351 C 890
S. Mathews 12351 P. Right 12352 C 890
S. Mathews 12351 O. Stray 12353 C 890
P. Rawles 12351 P. Right 12352 C 890
P. Rawles 12351 S. Mathews 12351 C 890
P. Rawles 12351 O. Stray 12353 C 890
P. Right 12352 O. Stray 12353 C 890
P. Right 12352 P. Rawles 12351 C 890
P. Right 12352 S. Mathews 12351 C 890
O. Stray 12353 P. Right 12352 C 890
O. Stray 12353 P. Rawles 12351 C 890
O. Stray 12353 S. Mathews 12351 C 890
基本上我需要一个数据框来反映“Ind.Names”是如何通过共享的“Inst。”链接的。我是R的新手,我尝试了几种方法,包括将Df.1分离为单独的“Inst”数据帧,然后应用以下函数:
My_function <- function(y){
Inst <- subset(Df.1, grepl(y, Df.1$`Inst.Id`))
+ return(Inst)
+ }
my_list <- c("A", "B", "C")
for(i in my_list){
Inst <- My_function(i)
assign(paste("Inst", i, sep = "."), Inst)
}
然后根据以下内容获取链接:
My_function2 <- function(x){
Df.C <- data.frame("Ind. Name"= C$`Ind`[x], "Linked Ind.Id"= C$`Linked Ind.Id*`[-(x)], "Linked Ind."= C$`Linked Ind.`[-(x)], "Inst"="C","Inst.ID*"=890)
所以使用循环
for(i in 1:4){
Network <- My_function2(i)
assign(paste("Network", i, sep = ".")
四个数据框中的结果:
Network.1:
“Ind. Name” “Ind.ID” “Linked Ind.” “Linked Ind.ID” “Inst” “Inst.ID”
S. Mathews 12351 P. Rawles 12351 C 890
S. Mathews 12351 P. Right 12352 C 890
S. Mathews 12351 O. Stray 12353 C 890
Network.2:
“Ind. Name” “Ind.ID” “Linked Ind.” “Linked Ind.ID” “Inst” “Inst.ID”
P. Rawles 12351 P. Right 12352 C 890
P. Rawles 12351 S. Mathews 12351 C 890
P. Rawles 12351 O. Stray 12353 C 890
Network.3:
“Ind. Name” “Ind.ID” “Linked Ind.” “Linked Ind.ID” “Inst” “Inst.ID”
P. Right 12352 O. Stray 12353 C 890
P. Right 12352 P. Rawles 12351 C 890
P. Right 12352 S. Mathews 12351 C 890
Network.4:
“Ind. Name” “Ind.ID” “Linked Ind.” “Linked Ind.ID” “Inst” “Inst.ID”
O. Stray 12353 P. Right 12352 C 890
O. Stray 12353 P. Rawles 12351 C 890
O. Stray 12353 S. Mathews 12351 C 890
看到我有4,000个不同的“Inst”和8,000个“Ind。”,这当然效率很低,所以我很感激任何有关如何以功能性方式实现这一点的帮助或提示
谢谢
答案 0 :(得分:0)
我认为你的主要目标是重现Df.Result
。您可以使用dplyr::full_join
进行简单的外部联接来实现此目的。
library(tidyverse);
full_join(df, df, by = "Inst") %>%
filter(Ind..Name.x != Ind..Name.y)
# Ind..Name.x Ind..ID.x Inst Inst..ID.x Ind..Name.y Ind..ID.y Inst..ID.y
#1 J. Smith 12345 A 532 K. Kapplan 12346 532
#2 J. Smith 12345 A 532 A. Lindt 12347 532
#3 K. Kapplan 12346 A 532 J. Smith 12345 532
#4 K. Kapplan 12346 A 532 A. Lindt 12347 532
#5 A. Lindt 12347 A 532 J. Smith 12345 532
#6 A. Lindt 12347 A 532 K. Kapplan 12346 532
#7 B. Johnson 12348 B 761 E. Pitt 12349 761
#8 E. Pitt 12349 B 761 B. Johnson 12348 761
#9 S. Mathews 12351 C 890 P. Rawles 12351 890
#10 S. Mathews 12351 C 890 P. Right 12352 890
#11 S. Mathews 12351 C 890 O. Stray 12353 890
#12 P. Rawles 12351 C 890 S. Mathews 12351 890
#13 P. Rawles 12351 C 890 P. Right 12352 890
#14 P. Rawles 12351 C 890 O. Stray 12353 890
#15 P. Right 12352 C 890 S. Mathews 12351 890
#16 P. Right 12352 C 890 P. Rawles 12351 890
#17 P. Right 12352 C 890 O. Stray 12353 890
#18 O. Stray 12353 C 890 S. Mathews 12351 890
#19 O. Stray 12353 C 890 P. Rawles 12351 890
#20 O. Stray 12353 C 890 P. Right 12352 890
说明:在df
之前执行ID
与其自身的外部联接,并删除具有相同Ind.Name
秒的行。
df <- read.table(text =
"'Ind. Name' 'Ind. ID' 'Inst' 'Inst. ID'
'J. Smith' 12345 A 532
'K. Kapplan' 12346 A 532
'A. Lindt' 12347 A 532
'B. Johnson' 12348 B 761
'E. Pitt' 12349 B 761
'S. Mathews' 12351 C 890
'P. Rawles' 12351 C 890
'P. Right' 12352 C 890
'O. Stray' 12353 C 890", header = T)