我想合并两个数据框y$genes
和symbol_annotations
y
的行名和symbol_annotations
的#34; hgnc_symbol",并创建一个标有"符号",y$genes$Symbol
的列,列出所有比赛。如果" hgnc_symbol"之间没有匹配和行名称,我想要' NA'填充而不是空单元格。我一直收到错误,因为这两个数据框不是相同的尺寸并且包含NA,我不知道如何纠正它。
>read.counts <- read.table("gene_counts.txt", header=TRUE)
>row.names(read.counts) <- read.counts$Geneid
>treatment <- factor(treatment)
> head(treatment)
[1] T0 IL2 IL2.ZA IL2.OKT3 IL2.OKT3.ZA T0
Levels: T0 IL2 IL2.OKT3 IL2.OKT3.ZA IL2.ZA
>y <- DGEList(read.counts, group=treatment, genes=read.counts)
>head(y$genes)
SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19
ENSG00000223972 0 1 1 1 0 0 1 0 0 3 0 0 1 2 0 0 0 0 1
ENSG00000227232 33 31 13 15 20 43 36 32 43 43 61 42 92 73 80 64 33 25 28
ENSG00000278267 1 0 1 0 0 5 3 1 1 2 1 0 2 4 6 0 2 2 1
ENSG00000243485 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
ENSG00000237613 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENSG00000268020 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SM20 SM21 SM22 SM23 SM24 SM25 SM26 SM27 SM28 SM29 SM30
ENSG00000223972 0 0 0 0 1 0 0 0 0 0 0
ENSG00000227232 15 60 13 29 22 28 87 42 61 67 74
ENSG00000278267 2 3 5 1 3 4 4 3 2 4 3
ENSG00000243485 0 0 0 0 0 1 0 0 0 0 1
ENSG00000237613 0 0 0 0 0 0 0 0 0 0 0
ENSG00000268020 0 0 0 0 0 0 0 0 0 0 0
>head(symbol_annotations, n=10)
ensembl_gene_id hgnc_symbol
1 ENSG00000210049 MT-TF
2 ENSG00000211459 MT-RNR1
3 ENSG00000210077 MT-TV
4 ENSG00000210082 MT-RNR2
5 ENSG00000209082 MT-TL1
6 ENSG00000198888 MT-ND1
7 ENSG00000210100 MT-TI
8 ENSG00000223795 <NA>
9 ENSG00000210107 MT-TQ
10 ENSG00000210112 MT-TM
>dim(symbol_annotations)
[1] 58069 2
>dim(y$genes)
[1] 58051 30
>y$genes$Symbol <- merge((rownames(y)), symbol_annotations[,c(2)])
Error in if (n > 0) c(NA_integer_, -n) else integer() :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
NAs introduced by coercion to integer range