合并匹配字符串的值

时间:2015-09-01 09:59:30

标签: r

我有两个文件,我想将File2的logFC与File1中具有相似ID的miRNA结合起来,并与File2相匹配。 F.ex所有具有ID1的miRNA应根据File2中的匹配字符串进行组合。

File1:

ID  miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24    hsa-miR-1179
ID24    hsa-miR-7-2
ID24    hsa-miR-3677
ID25    hsa-miR-940
ID25    hsa-miR-4717

File2: 
miRNA           logFC
hsa-miR-512-1   13
hsa-miR-512-2   123 
hsa-miR-1323    53
hsa-miR-498     4.2
hsa-miR-520e    12
hsa-miR-515-1   1
hsa-miR-519e    56
hsa-miR-520f    113
hsa-miR-495     11
hsa-miR-376c    11
hsa-miR-376a-2  113 
hsa-miR-654     13
hsa-miR-376b    123
hsa-miR-376a-1  567
hsa-miR-300     757
hsa-miR-1185-1   6
hsa-miR-1185-2  35
hsa-miR-1179    2
hsa-miR-7-2     2
hsa-miR-3677    1
hsa-miR-940     134
hsa-miR-4717    566


Output:

ID1     Average logFC for all ID1 miRNA
ID2     Average logFC for all ID2 miRNA
...

1 个答案:

答案 0 :(得分:1)

正如@Heroka在开头提到的那样,它是一个merge工作(这意味着在右键列上加入你的表)。我正在使用dplyr方法,但是还有许多其他方法/命令可以做到这一点:

File1 = read.table(text="ID  miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24    hsa-miR-1179
ID24    hsa-miR-7-2
ID24    hsa-miR-3677
ID25    hsa-miR-940
ID25    hsa-miR-4717", header=T)

File2 = read.table(text="miRNA           logFC
hsa-miR-512-1   13
hsa-miR-512-2   123 
hsa-miR-1323    53
hsa-miR-498     4.2
hsa-miR-520e    12
hsa-miR-515-1   1
hsa-miR-519e    56
hsa-miR-520f    113
hsa-miR-495     11
hsa-miR-376c    11
hsa-miR-376a-2  113 
hsa-miR-654     13
hsa-miR-376b    123
hsa-miR-376a-1  567
hsa-miR-300     757
hsa-miR-1185-1   6
hsa-miR-1185-2  35
hsa-miR-1179    2
hsa-miR-7-2     2
hsa-miR-3677    1
hsa-miR-940     134
hsa-miR-4717    566", header=T)


library(dplyr)

File1 %>% 
  inner_join(File2, by="miRNA") %>%     # join your datasets based on miRNA column
  group_by(ID) %>%                      # group by ID
  summarise(AvgLogFC = mean(logFC))     # calculate average values

#     ID   AvgLogFC
# 1  ID1  46.900000
# 2  ID2 181.777778
# 3 ID24   1.666667
# 4 ID25 350.000000

请注意,我使用inner_join,假设File1中的所有miRNA值都存在于File2中。