我有一个table(1)
看起来像这样(它是所有距离matrix transformed
到标签分隔列表中):
sample1 sample2 405
sample3 sample4 400
sample5 sample6 1
sample7 sample8 20
sample1 sample3 40
我有另一个table(2)
,其中包含符合特定条件的样本:
sample1
sample2
sample8
如何解析第一个table(1)
以仅提取1
中可以找到2
和table(2)
列中的变量的那些行?
即所需的比较仅为:
sample1 sample2 405
sample2 sample8 40
sample8 sample1 100
答案 0 :(得分:2)
我尝试使用表格(1)的数据框和表格(2)的矢量进行类似的设置。
table_one <- data.frame(col_1 = c("a", "b", "c", "d"),
col_2 = c("b", "d", "f", "g"),
col_3 = c(1, 2, 3, 4))
table_two <- c("b", "d")
当你以这种方式设置时,这样的事情应该有效:
library(tidyverse)
table_one %>% filter(col_1 %in% table_two,
col_2 %in% table_two)
答案 1 :(得分:2)
以下是基础R解决方案:
final = merged.pivot(index='PERMNO', columns='FROMDATE', values='MORET').reset_index()
# final:
FROMDATE PERMNO 20131010 20131231
0 79702 NaN 0.012283
1 85576 NaN 0.038766
2 85751 -0.01 NaN
3 93044 -0.02 NaN
...和输出:
rawData1 <- "first second distance
sample1 sample2 405
sample3 sample4 400
sample5 sample6 1
sample7 sample8 20
sample1 sample3 40"
rawData2 <- "sample
sample1
sample2
sample8"
a <- read.table(textConnection(rawData1),stringsAsFactors=FALSE,header=TRUE)
b <- read.table(textConnection(rawData2),stringsAsFactors=FALSE,header=TRUE)
a[a$first %in% b$sample & a$second %in% b$sample, ]
答案 2 :(得分:1)
最佳选项可能是inner_join
两次,包括第一列和第二列,然后执行两个结果集的intersect
。
library(dplyr)
df1 <- read.table(text = "Samp1 Samp2 Val
sample1 sample2 405
sample3 sample4 400
sample5 sample6 1
sample7 sample8 20
sample1 sample3 40", header = TRUE, stringsAsFactors = FALSE)
> df1
Samp1 Samp2 Val
1 sample1 sample2 405
2 sample3 sample4 400
3 sample5 sample6 1
4 sample7 sample8 20
5 sample1 sample3 40
df2 <- data.frame(Samp = c("sample1",
"sample2",
"sample8"), stringsAsFactors = FALSE)
> df2
Samp
1 sample1
2 sample2
3 sample8
#use inner_join between Samp1 with Samp and then again Samp2 with Samp
intersect(inner_join(df1,df2, by = c("Samp1" = "Samp")),
inner_join(df1,df2, by = c("Samp2" = "Samp")))
The result will be:
Samp1 Samp2 Val
1 sample1 sample2 405