使用reduce函数并在数据框列表中相交(来自dplyr)以创建另一个数据框

时间:2018-09-19 15:02:12

标签: r dplyr

这是我的数据框:

    df<-list(structure(list(A = structure(1:6, .Label = c("A~B", "B~C", 
"C~D", "D~C", "E~F", "F~G"), class = "factor"), V2 = structure(1:6, .Label = c("1", 
"2", "3", "4", "5", "6"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L)), structure(list(A = structure(c(1L, 4L, 5L, 6L, 2L, 3L), .Label = c("A~B", 
"E~F", "H~G", "M~C", "N~D", "P~C"), class = "factor"), V2 = structure(c(3L, 
4L, 5L, 6L, 1L, 2L), .Label = c("10", "12", "2", "4", "6", "8"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L)), structure(list(A = structure(c(1L, 3L, 5L, 4L, 6L, 2L), .Label = c("A~B", 
"H~G", "M~C", "T~C", "U~D", "W~S"), class = "factor"), V2 = structure(c(4L, 
5L, 6L, 1L, 2L, 3L), .Label = c("12", "15", "18", "3", "6", "9"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L)))

通过下面的命令,我选择在每个列表位置的3个数据帧上重复的一个或多个对。在这种情况下,结果应仅为A〜B对:

the_best_pairs=Reduce(f = dplyr::intersect, x = df)

然后我收到此消息:

Warning messages:
1: Column `A` joining factors with different levels, coercing to character vector 
2: Column `V2` joining factors with different levels, coercing to character vector 
3: Column `A` joining character vector and factor, coercing into character vector 
4: Column `V2` joining character vector and factor, coercing into character vector 

我原来带有数据框的列表更大,并且具有以下结构:

List of 3
 $ :'data.frame':   685 obs. of  2 variables:
  ..$ Var1         : Factor w/ 4828 levels "ABEV3~AEDU3",..: 1016 43 37 1022 1992 1034 4004 989 986 36 ...
  ..$ Dickey_Fuller: num [1:685] -5.15 -5.06 -5.05 -5.03 -5.03 ...
 $ :'data.frame':   650 obs. of  2 variables:
  ..$ Var1         : Factor w/ 4828 levels "ABEV3~AEDU3",..: 1016 2126 995 2746 2125 1034 1936 996 970 1992 ...
  ..$ Dickey_Fuller: num [1:650] -5.37 -5.26 -5.17 -5.08 -5.05 ...
 $ :'data.frame':   711 obs. of  2 variables:
  ..$ Var1         : Factor w/ 4828 levels "ABEV3~AEDU3",..: 43 37 36 4065 2058 3961 975 2966 2126 66 ...
  ..$ Dickey_Fuller: num [1:711] -5.38 -5.2 -5.08 -4.83 -4.81 ...

运行:the_best_pairs=Reduce(f = dplyr::intersect, x = dflist)命令后,原始数据帧上没有任何错误或警告消息,但是此命令导致数据帧为空。

我在做什么错了?

我正确使用了以下命令:the_best_pairs=Reduce(f = dplyr::intersect, x = dflist)吗?

有帮助吗?

2 个答案:

答案 0 :(得分:2)

正如已经在注释中说明的那样,您仅对成对感兴趣,而对V2列中的相应值不感兴趣。

因此,仅使线对相交,可以使用:

Reduce(f = dplyr::intersect, x = lapply(df, "[[", "A"))
# [1] "A~B"

lapply(df, "[[", "A")选择每个data.pair中的列。列表中的frame返回一个列表,然后Reduce可以按您期望的那样工作。

答案 1 :(得分:0)

在我看来,您得到的结果是空的,因为您的列表没有通用的值。

如果您运行:

paste0(df[[1]]$A," ",df[[1]]$V2 )
 "A~B 1" "B~C 2" "C~D 3" "D~C 4" "E~F 5" "F~G 6"
paste0(df[[2]]$A," ",df[[2]]$V2 )
 "A~B 2"  "M~C 4"  "N~D 6"  "P~C 8"  "E~F 10" "H~G 12"
paste0(df[[3]]$A," ",df[[3]]$V2 )
 "A~B 3"  "M~C 6"  "U~D 9"  "T~C 12" "W~S 15" "H~G 18"

它向您显示,您的列表没有任何相交。

您通过以下方式得到相同的结果

l1<-list(paste0(df[[1]]$A," ",df[[1]]$V2 ))
l2<-list(paste0(df[[2]]$A," ",df[[2]]$V2 ))
l3<-list(paste0(df[[3]]$A," ",df[[3]]$V2 ))

li<-list(l1,l2,l3)

Reduce(dplyr::intersect,li )

它也给了我一个空白列表。