让我举个例子。考虑我们有3个表(专注于列N):
Table 1 Table 2 Table 3
------------- ------------- -------------
N Values N Values N Values
------------- ------------- -------------
5 1 5 -1 5 1
10 2 6 -2 6 21
15 3 10 -3 10 5
15 -4 12 6
15 3
我想删除多余的行,以便所有表都具有相同的列N
。
结果:
Table 1 Table 2 Table 3
------------- ------------- -------------
N Values N Values N Values
------------- ------------- -------------
5 1 5 -1 5 1
10 2 10 -3 10 5
15 3 15 -4 15 3
我相信在R中有一些简单的方法,但我绝对是菜鸟。我非常感谢你的帮助!
Table1 <- structure(list(N = c(5L, 10L, 15L), Values = 1:3), .Names = c("N",
"Values"), row.names = c(NA, 3L), class = "data.frame")
Table2 <- structure(list(N = c(5L, 6L, 10L, 15L), Values = c(-1L, -2L,
-3L, -4L)), .Names = c("N", "Values"), row.names = c(NA, 4L), class = "data.frame")
Table3 <- structure(list(N = c(5L, 6L, 10L, 12L, 15L), Values = c(1L, 21L,
5L, 6L, 3L)), .Names = c("N", "Values"), row.names = c(NA, 5L
), class = "data.frame")
答案 0 :(得分:1)
使用集合交集来查找所有表中的N的公共值
> t1 <-data.frame(N=c(5,10,15),Values=c(1,2,3))
> t2 <-data.frame(N=c(5,6,10,15),Values=c(-1,-2,-3,-4))
> t3 <-data.frame(N=c(5,6,10,12,15),Values=c(1,21,5,6,3))
> common<-intersect(intersect(t1$N,t2$N),t3$N)
> common
[1] 5 10 15
然后只是将每个表子集化以查找具有这些公共值的行
> newt1<-t1[t1$N %in% common,]
> newt2<-t2[t2$N %in% common,]
> newt3<-t3[t3$N %in% common,]
> newt3
N Values
1 5 1
3 10 5
5 15 3
此方法应进行缩放,以便您可以创建函数并传入数据框和列名称的向量。它可以返回新数据帧的向量。
我使用过数据帧。相同的方法适用于矩阵
答案 1 :(得分:1)
我想提出一种适用于任意数量的数据帧以及多个id列的通用方法。
数据帧可以具有不同的结构,即不同数量和类型的列。唯一的要求是数据帧共享具有相同名称和类型的所有id列。此外,它还会检测数据帧之间是否存在 no id值的常见组合。
假设我们有一个数据框列表dfl
和一个列名cn
的向量,应检查列表中所有数据框的公共值组合:
dfl <- list(Table1, Table2, Table3)
cn <- "N"
library(data.table)
# determine common combinations of id values
common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
, .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
# stop if there are no column id values
stopifnot(nrow(common) > 0L)
# join with all data tables in dfl, keeping only rows which have common id values
result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])
result
$Table1 N Values 1: 5 1 2: 10 2 3: 15 3 $Table2 N Values 1: 5 -1 2: 10 -3 3: 15 -4 $Table3 N Values 1: 5 1 2: 10 5 3: 15 3
dfl <- structure(list(Table1 = structure(list(N = c(5L, 10L, 15L), Values = 1:3), .Names = c("N",
"Values"), row.names = c(NA, 3L), class = "data.frame"), Table2 = structure(list(
N = c(5L, 6L, 10L, 15L), Values = c(-1L, -2L, -3L, -4L)), .Names = c("N",
"Values"), row.names = c(NA, 4L), class = "data.frame"), Table3 = structure(list(
N = c(5L, 6L, 10L, 12L, 15L), Values = c(1L, 21L, 5L, 6L,
3L)), .Names = c("N", "Values"), row.names = c(NA, 5L), class = "data.frame")), .Names = c("Table1",
"Table2", "Table3"))
# create sample data: 5 dataframes with 100 rows each and 3 id columns
set.seed(123L)
ndf <- 5L
dfl <- lapply(seq_len(ndf), function(i) {
nr <- 100L
nseq <- 1:6
data.frame(A = sample(LETTERS[nseq], nr, replace = TRUE),
b = sample(letters[nseq], nr, replace = TRUE),
i = sample(nseq, nr, replace = TRUE),
val = sample.int(nr, nr))
})
dfl <- setNames(dfl, paste0("df", seq_along(dfl)))
str(dfl)
List of 5 $ df1:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 2 5 3 6 6 1 4 6 4 3 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 2 3 6 3 6 6 4 3 1 ... ..$ i : int [1:100] 2 6 4 4 3 6 3 2 2 2 ... ..$ val: int [1:100] 79 1 77 71 61 46 15 99 42 45 ... $ df2:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 1 6 4 3 3 5 1 3 5 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 3 3 2 1 3 2 4 4 6 3 ... ..$ i : int [1:100] 2 5 2 2 2 5 1 5 2 3 ... ..$ val: int [1:100] 85 26 3 84 33 61 52 36 18 40 ... $ df3:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 3 3 1 1 2 6 3 3 5 5 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 6 4 6 4 5 4 5 6 5 1 ... ..$ i : int [1:100] 2 4 1 6 6 3 5 2 1 3 ... ..$ val: int [1:100] 81 73 22 99 84 51 57 88 93 61 ... $ df4:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 3 5 3 6 1 1 5 4 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 1 3 4 6 5 4 1 1 5 1 ... ..$ i : int [1:100] 2 2 1 3 2 5 4 6 1 6 ... ..$ val: int [1:100] 94 98 45 23 67 53 55 41 40 100 ... $ df5:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 4 1 2 5 5 1 6 1 4 3 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 5 1 3 6 6 5 1 4 6 4 ... ..$ i : int [1:100] 1 6 2 5 4 1 6 4 6 4 ... ..$ val: int [1:100] 45 28 16 85 54 53 56 68 59 94 ...
# define id columns
cn <- c("i", "A", "b")
common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
, .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
stopifnot(nrow(common) > 0L)
result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])
str(result)
List of 5 $ df1:Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 6 6 6 4 2 1 5 ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 4 4 6 6 3 2 3 4 2 ..$ i : int [1:10] 2 2 2 3 3 6 5 6 4 1 ..$ val: int [1:10] 99 85 4 36 83 70 12 52 53 58 ..- attr(*, ".internal.selfref")=<externalptr> $ df2:Classes ‘data.table’ and 'data.frame': 11 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 4 4 2 1 5 5 4 1 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 3 2 2 3 4 4 4 1 1 ... ..$ i : int [1:11] 2 6 5 5 6 4 1 1 5 3 ... ..$ val: int [1:11] 11 1 58 14 5 71 52 39 81 88 ... ..- attr(*, ".internal.selfref")=<externalptr> $ df3:Classes ‘data.table’ and 'data.frame': 14 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 4 2 1 1 5 5 5 5 5 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 6 2 3 4 4 2 2 4 4 4 ... ..$ i : int [1:14] 3 5 6 4 4 1 1 1 1 1 ... ..$ val: int [1:14] 25 60 18 78 59 26 32 39 77 28 ... ..- attr(*, ".internal.selfref")=<externalptr> $ df4:Classes ‘data.table’ and 'data.frame': 14 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 4 2 2 5 5 4 4 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 6 3 3 2 3 3 2 2 1 1 ... ..$ i : int [1:14] 3 6 6 5 6 6 1 1 5 5 ... ..$ val: int [1:14] 56 86 34 70 31 12 72 1 5 64 ... ..- attr(*, ".internal.selfref")=<externalptr> $ df5:Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 1 1 2 ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 6 3 4 1 4 ..$ i : int [1:6] 2 3 6 4 3 4 ..$ val: int [1:6] 11 48 1 68 32 46 ..- attr(*, ".internal.selfref")=<externalptr>
在每个数据框中,只剩下几行共享id值的常见组合:
unlist(lapply(result, nrow))
df1 df2 df3 df4 df5 10 11 14 14 6
答案 2 :(得分:0)
一旦找到&#34;共同点&#34; (这里是表1),你可以这样做:
Table2 <- Table2[Table2$N %in% Table1$N,]
Table3 <- Table3[Table3$N %in% Table1$N,]
答案 3 :(得分:0)
这是一种更适用于任何表列表的功能方式。首先,我们提取所有'N'列,然后得到所有这些值的交集。然后我们只过滤每个表。
library('tidyverse')
tables <- list(Table1, Table2, Table3)
common <- tables %>%
map('N') %>%
reduce(intersect)
tables %>%
map(filter, N %in% common)
# [[1]]
# N Values
# 1 5 1
# 2 10 2
# 3 15 3
#
# [[2]]
# N Values
# 1 5 -1
# 2 10 -3
# 3 15 -4
#
# [[3]]
# N Values
# 1 5 1
# 2 10 5
# 3 15 3