我有一个像这样的data.frame:
20021 K08975 K09735 0.929
20022 K08979 K09735 0.934
20023 K09140 K09735 0.901
20024 K09142 K09735 0.938
20025 K09152 K09735 0.947
20026 K09482 K09735 0.919
20027 K09716 K09735 0.944
20028 K09723 K09735 0.949
20029 K09726 K09735 0.915
20030 K06875 K09736 0.905
20031 K09149 K09736 0.901
20032 K09721 K09736 0.903
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906
20035 K00952 K09738 0.904
20036 K01622 K09738 0.907
20037 K06875 K09738 0.912
20038 K06963 K09738 0.923
20039 K07060 K09738 0.934
共有三列:var1
,var2
和corr
var1
和var2
可以使用值“ KOXXXX”或“ OTUXXXX”。
我想保留var1
和var2
不同的行,我的意思是仅显示KOXXXX OTUXXXX
或OTUXXXX KOXXXX
的行
答案 0 :(得分:2)
也许这很幼稚,但可以帮助您
# here you take only the rows where the first two character of var1 and var2
# are different
df[substr(df$var1,1,2) != substr(df$var2,1,2),]
var1 var2 corr
20033 OTU0001 K09738 0.908
20034 OTU0095 K09738 0.906
答案 1 :(得分:1)
大概是
subset(df, grepl("^K0", var1) & grepl("^OTU", var2) |
grepl("^OTU", var1) & grepl("^K0", var2))
# var1 var2 corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906
或使用startsWith
subset(df, startsWith(var1, "K0") & startsWith(var2, "OTU") |
startsWith(var1, "OTU") & startsWith(var2, "K0"))
或者使用dplyr
,我们可以将grepl
/ str_detect
与filter
一起使用
library(dplyr)
library(stringr)
df %>%
filter(str_detect(var1, "^K0") & str_detect(var2, "^OTU") |
str_detect(var1, "^OTU") & str_detect(var2, "^K0"))
数据
df <- structure(list(var1 = c("K08975", "K08979", "K09140", "K09142",
"K09152", "K09482", "K09716", "K09723", "K09726", "K06875", "K09149",
"K09721", "OTU0001", "OTU0095", "K00952", "K01622", "K06875",
"K06963", "K07060"), var2 = c("K09735", "K09735", "K09735", "K09735",
"K09735", "K09735", "K09735", "K09735", "K09735", "K09736", "K09736",
"K09736", "K09738", "K09738", "K09738", "K09738", "K09738", "K09738",
"K09738"), corr = c(0.929, 0.934, 0.901, 0.938, 0.947, 0.919,
0.944, 0.949, 0.915, 0.905, 0.901, 0.903, 0.908, 0.906, 0.904,
0.907, 0.912, 0.923, 0.934)), row.names = 20021:20039, class =
"data.frame")
答案 2 :(得分:1)
我们也可以在base R
中这样做
df[Reduce(`!=`, lapply(df[1:2], substr, 1, 2)),]
# var1 var2 corr
#20033 OTU0001 K09738 0.908
#20034 OTU0095 K09738 0.906