我在R中有一个包含数千行和4列的大型data.frame。 例如:
Chromosome Start End Count
1 NC_031985.1 16255093 16255094 1
2 NC_031972.1 11505205 11505206 1
3 NC_031971.1 24441227 24441228 1
4 NC_031977.1 29030540 29030541 1
5 NC_031969.1 595867 595868 1
6 NC_031986.1 40147812 40147813 1
我有这个带有染色体名称的data.frame
LG1 NC_031965.1
LG2 NC_031966.1
LG3a NC_031967.1
LG3b NC_031968.1
LG4 NC_031969.1
LG5 NC_031970.1
LG6 NC_031971.1
LG7 NC_031972.1
LG8 NC_031973.1
LG9 NC_031974.1
LG10 NC_031975.1
LG11 NC_031976.1
LG12 NC_031977.1
LG13 NC_031978.1
LG14 NC_031979.1
LG15 NC_031980.1
LG16 NC_031987.1
LG17 NC_031981.1
LG18 NC_031982.1
LG19 NC_031983.1
LG20 NC_031984.1
LG22 NC_031985.1
LG23 NC_031986.1
我想用上面列出的染色体名称替换大矩阵的所有行名称并得到:
Chromosome Start End Count
1 LG22 16255093 16255094 1
2 LG7 11505205 11505206 1
3 LG6 24441227 24441228 1
4 LG12 29030540 29030541 1
5 LG4 595867 595868 1
6 LG23 40147812 40147813 1
有人知道哪种方法不那么痛苦吗? 这可能很容易(或没有)但我在R中的经验是有限的。
非常感谢!
答案 0 :(得分:0)
正如人们在寻找的那样,这里的评论中讨论的是dplyr
解决方案:
library(dplyr)
df %>%
inner_join(chromo_names, by = c("Chromosome" = "V2")) %>%
select(Chromosome = V1, Start, End, Count)
这会发出一条警告消息,指出两个合并列具有不同的因子级别。您可以忽略它并使用字符或将合并列转换为如下因素:
df %>%
inner_join(chromo_names, by = c("Chromosome" = "V2")) %>%
select(Chromosome = V1, Start, End, Count) %>%
mutate(Chromosome = as.factor(Chromosome))
以下是 Base R 解决方案:
merged = merge(df, chromo_names,
by.x = "Chromosome",
by.y = "V2",
sort = FALSE)
merged = merged[c(5,2:4)]
names(merged)[1] = "Chromosome"
<强>结果:强>
Chromosome Start End Count
1 LG22 16255093 16255094 1
2 LG7 11505205 11505206 1
3 LG6 24441227 24441228 1
4 LG12 29030540 29030541 1
5 LG4 595867 595868 1
6 LG23 40147812 40147813 1
数据:强>
df = read.table(text = " Chromosome Start End Count
1 NC_031985.1 16255093 16255094 1
2 NC_031972.1 11505205 11505206 1
3 NC_031971.1 24441227 24441228 1
4 NC_031977.1 29030540 29030541 1
5 NC_031969.1 595867 595868 1
6 NC_031986.1 40147812 40147813 1", header = TRUE)
chromo_names = read.table(text = "LG1 NC_031965.1
LG2 NC_031966.1
LG3a NC_031967.1
LG3b NC_031968.1
LG4 NC_031969.1
LG5 NC_031970.1
LG6 NC_031971.1
LG7 NC_031972.1
LG8 NC_031973.1
LG9 NC_031974.1
LG10 NC_031975.1
LG11 NC_031976.1
LG12 NC_031977.1
LG13 NC_031978.1
LG14 NC_031979.1
LG15 NC_031980.1
LG16 NC_031987.1
LG17 NC_031981.1
LG18 NC_031982.1
LG19 NC_031983.1
LG20 NC_031984.1
LG22 NC_031985.1
LG23 NC_031986.1", header = FALSE)