我有2列长的(约700,000行)CSV。一列带有位置的列,其位置写为cg,后跟8位数字(例如cg12345678),而对应的列具有r值(正好在-1和1之间的一位数字)。其他CSV小得多(约20行),仅包含一列带有某些cg位置的列。我只想打印与小excel的cg位置相对应的大excel的r值。
以下是第2列CSV的缩写示例:
cg07881041 -0.0192398465425986
*cg03513874 -0.339360471677652
cg25458538 0.0451334622844003
*cg09261072 0.208770797055665
cg02404579 -0.0166889943192668
cg22585117 -0.340873841270817
*cg25552317 -0.0372823043801581
以下是一列CSV的示例:
cg08829765
*cg25552317
*cg09261072
cg14370485
*cg03513874
cg10855276
cg12406992
在此示例中,我向匹配的3个位置添加了星星。所以我想打印以下矩阵:
Matching cg corresponding rvalue
cg03513874 -0.339360471677652
cg09261072 0.208770797055665
cg25552317 -0.0372823043801581
答案 0 :(得分:0)
这是dplyr的方法:
library(dplyr)
df_1col %>%
left_join(df_2col) %>%
filter(!is.na(p_value))
#Joining, by = "cg"
# cg p_value
#1 cg25552317 -0.0372823
#2 cg09261072 0.2087708
#3 cg03513874 -0.3393605
源数据:
df_2col <- read.table(
header = T,
stringsAsFactors = F,
text = "cg p_value
cg07881041 -0.0192398465425986
cg03513874 -0.339360471677652
cg25458538 0.0451334622844003
cg09261072 0.208770797055665
cg02404579 -0.0166889943192668
cg22585117 -0.340873841270817
cg25552317 -0.0372823043801581")
df_1col <- data.frame(cg = c("cg08829765","cg25552317",
"cg09261072","cg14370485",
"cg03513874","cg10855276",
"cg12406992"), stringsAsFactors = F)