我需要根据第二个数据帧(列Gene_SYMBOL)的值更改数据帧第一列(ID_REF)的名称,并与两个数据帧的第一列(ID_REF和IlmnID)匹配。
df1
ID_REF Sample1 Sample2 Sample3
cg00000292 0.2841738 1.212398 0.5326877
cg00002426 -4.7278154 -4.217920 -4.1224573
cg00003994 -5.7353341 -5.966922 -6.2235540
df2
IlmnID NameIlmnStrand AddressA_ID Gene_Symbol
cg00002426 cg00002426 TOP SLMAP
cg00005847 cg00005847 BOT HOXD3
cg00000292 cg00000292 TOP ATP2A1
cg00006414 cg00006414 BOT ZNF398
cg00003994 cg00003994 TOP MEOX2
我的输出:
new_df
Gene_Symbol Sample1 Sample2 Sample3
ATP2A1 0.2841738 1.212398 0.5326877
SLMAP -4.7278154 -4.217920 -4.1224573
MEOX2 -5.7353341 -5.966922 -6.2235540
答案 0 :(得分:1)
这只是一个简单的inner_join
。您可以使用dplyr
包,或使用基础R中的merge
。请注意,如果没有在df中匹配的ID_REF,则使用inner_join
将省略该行。
library(dplyr)
new_df <- inner_join(df1, df2, by = c("ID_REF" = "IlmnID")) %>%
select(Gene_Symbol, Sample1, Sample2, Sample3)
答案 1 :(得分:1)
基础套餐:
merge(df2[ , c("NameIlmnStrand", "Gene_Symbol")], df1,
by.x = "NameIlmnStrand", by.y = 'ID_REF',
all.y = TRUE)[ ,-1]
<强>输出继电器强>
Gene_Symbol Sample1 Sample2 Sample3
1 ATP2A1 0.2841738 1.212398 0.5326877
2 SLMAP -4.7278154 -4.217920 -4.1224573
3 MEOX2 -5.7353341 -5.966922 -6.2235540
答案 2 :(得分:0)
df1<- data.frame(
ID_REF=c("cg00000292", "cg00002426", "cg00003994"),
sample1 = rnorm(3),
Sample2 = rnorm(3),
stringsAsFactors = F
)
df2 <- data.frame(
IlmnID = c("cg00000292", "cg00002426", "cg00003994"),
Gene_Symbol= c("SLMAP", "ATP2A", "MEOX2"),
stringsAsFactors = F
)
# If you are sure that all IDs are included in df2
df1$ID_REF <- df2$Gene_Symbol[df2$IlmnID == df1$ID_REF]
#otherwise use sapply
df1$ID_REF <- sapply(df1$ID_REF , function(x) {
if (x %in% df2$IlmnID) {
df2$Gene_Symbol[df2$IlmnID == x]
} else {
NA
}})