从矩阵中计算特定字符串的出现次数

时间:2017-11-01 20:43:00

标签: r

我有一个看起来像这样的矩阵:

[1] "B/A" "A/A" "B/A" "B/A" "B/B" "B/B" "B/B" "B/B" "A/A" "B/A" "B/B" "A/A" "B/A" "B/A" "B/A"
 [16] "B/B" "B/A" "B/B" "B/B" "B/A" "B/B" "B/B" "B/A" "B/A" "B/A" "B/B" "B/B" "B/B" "B/B" "A/A"
 [31] "B/B" "B/B" "B/A" "B/B" "B/A" "B/A" "B/B" "A/A" "B/A" "B/A" "B/A" "B/B" "B/B" "B/B" "B/A"    

我想计算" B / A"," A / A"和" B / B"从每个单元格中将其放入新的矩阵中。如果" B / A"如果" A / A"那么检测到那个计数将是矩阵的那个单元格中的一个。被检测到然后计数将是两个,如果" B / B"是0。

基本上,新矩阵看起来像:

[1] 1 2 1 1 0 0 0 0 2 1 ... so on 
[16] 0 1 0 0 1 0 0 1 1 1 ...

我会制作一个矩阵 为了做到这一点,我的代码看起来像这样:

count <- 0
for(i in dim(matrix1)[1])
{
  if(snp1 == "B/A")
    count = count + 1
}
print(count)

但是,我收到此输出错误:

Warning message:
In if (snp1 == "B/A") count = count + 1 :
  the condition has length > 1 and only the first element will be used

2 个答案:

答案 0 :(得分:5)

也许这样的事情会起作用:

# Generate data
alleles <- c("B/B", "B/A", "A/A")
genotype <- matrix(sample(alleles, 20, replace = TRUE), 5)

#      [,1]  [,2]  [,3]  [,4] 
# [1,] "B/B" "A/A" "B/A" "A/A"
# [2,] "B/B" "A/A" "A/A" "B/B"
# [3,] "A/A" "A/A" "B/B" "B/A"
# [4,] "B/A" "B/B" "B/A" "A/A"
# [5,] "B/B" "B/A" "B/A" "B/A"

genotypeQuant <- matrix(as.numeric(factor(genotype, levels = alleles)) - 1,
                        nrow = nrow(genotype))

#      [,1] [,2] [,3] [,4]
# [1,]    0    2    1    2
# [2,]    0    2    2    0
# [3,]    2    2    0    1
# [4,]    1    0    1    2
# [5,]    0    1    1    1

首先将矩阵(等位基因/ SNP)转换为因子(向量alleles中提供的顺序),然​​后将这些因子转换为数字。

答案 1 :(得分:-1)

假设您的“A / A”,“B / A”和“B / B”的矢量被称为df,我只会使用:

df[df=="B/B"] <- 0
df[df=="B/A"] <- 1
df[df=="A/A"] <- 2
df <- as.numeric(df)

只要你真的只有这三个独特的值,这个解决方案就可以解决问题。