我有两个文件,我想将File2的logFC与File1中具有相似ID的miRNA结合起来,并与File2相匹配。 F.ex所有具有ID1的miRNA应根据File2中的匹配字符串进行组合。
File1:
ID miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24 hsa-miR-1179
ID24 hsa-miR-7-2
ID24 hsa-miR-3677
ID25 hsa-miR-940
ID25 hsa-miR-4717
File2:
miRNA logFC
hsa-miR-512-1 13
hsa-miR-512-2 123
hsa-miR-1323 53
hsa-miR-498 4.2
hsa-miR-520e 12
hsa-miR-515-1 1
hsa-miR-519e 56
hsa-miR-520f 113
hsa-miR-495 11
hsa-miR-376c 11
hsa-miR-376a-2 113
hsa-miR-654 13
hsa-miR-376b 123
hsa-miR-376a-1 567
hsa-miR-300 757
hsa-miR-1185-1 6
hsa-miR-1185-2 35
hsa-miR-1179 2
hsa-miR-7-2 2
hsa-miR-3677 1
hsa-miR-940 134
hsa-miR-4717 566
Output:
ID1 Average logFC for all ID1 miRNA
ID2 Average logFC for all ID2 miRNA
...
答案 0 :(得分:1)
正如@Heroka在开头提到的那样,它是一个merge
工作(这意味着在右键列上加入你的表)。我正在使用dplyr
方法,但是还有许多其他方法/命令可以做到这一点:
File1 = read.table(text="ID miRNA
ID1 hsa-miR-512-1
ID1 hsa-miR-512-2
ID1 hsa-miR-1323
ID1 hsa-miR-498
ID1 hsa-miR-520e
ID1 hsa-miR-515-1
ID1 hsa-miR-519e
ID1 hsa-miR-520f
ID2 hsa-miR-495
ID2 hsa-miR-376c
ID2 hsa-miR-376a-2
ID2 hsa-miR-654
ID2 hsa-miR-376b
ID2 hsa-miR-376a-1
ID2 hsa-miR-300
ID2 hsa-miR-1185-1
ID2 hsa-miR-1185-2
ID24 hsa-miR-1179
ID24 hsa-miR-7-2
ID24 hsa-miR-3677
ID25 hsa-miR-940
ID25 hsa-miR-4717", header=T)
File2 = read.table(text="miRNA logFC
hsa-miR-512-1 13
hsa-miR-512-2 123
hsa-miR-1323 53
hsa-miR-498 4.2
hsa-miR-520e 12
hsa-miR-515-1 1
hsa-miR-519e 56
hsa-miR-520f 113
hsa-miR-495 11
hsa-miR-376c 11
hsa-miR-376a-2 113
hsa-miR-654 13
hsa-miR-376b 123
hsa-miR-376a-1 567
hsa-miR-300 757
hsa-miR-1185-1 6
hsa-miR-1185-2 35
hsa-miR-1179 2
hsa-miR-7-2 2
hsa-miR-3677 1
hsa-miR-940 134
hsa-miR-4717 566", header=T)
library(dplyr)
File1 %>%
inner_join(File2, by="miRNA") %>% # join your datasets based on miRNA column
group_by(ID) %>% # group by ID
summarise(AvgLogFC = mean(logFC)) # calculate average values
# ID AvgLogFC
# 1 ID1 46.900000
# 2 ID2 181.777778
# 3 ID24 1.666667
# 4 ID25 350.000000
请注意,我使用inner_join
,假设File1中的所有miRNA
值都存在于File2中。