Question

我有一个看起来像这样的矩阵：

[1] "B/A" "A/A" "B/A" "B/A" "B/B" "B/B" "B/B" "B/B" "A/A" "B/A" "B/B" "A/A" "B/A" "B/A" "B/A"
 [16] "B/B" "B/A" "B/B" "B/B" "B/A" "B/B" "B/B" "B/A" "B/A" "B/A" "B/B" "B/B" "B/B" "B/B" "A/A"
 [31] "B/B" "B/B" "B/A" "B/B" "B/A" "B/A" "B/B" "A/A" "B/A" "B/A" "B/A" "B/B" "B/B" "B/B" "B/A"

我想计算＆＃34; B / A＆＃34;，＆＃34; A / A＆＃34;和＆＃34; B / B＆＃34;从每个单元格中将其放入新的矩阵中。如果＆＃34; B / A＆＃34;如果＆＃34; A / A＆＃34;那么检测到那个计数将是矩阵的那个单元格中的一个。被检测到然后计数将是两个，如果＆＃34; B / B＆＃34;是0。

基本上，新矩阵看起来像：

[1] 1 2 1 1 0 0 0 0 2 1 ... so on 
[16] 0 1 0 0 1 0 0 1 1 1 ...

我会制作一个矩阵为了做到这一点，我的代码看起来像这样：

count <- 0
for(i in dim(matrix1)[1])
{
  if(snp1 == "B/A")
    count = count + 1
}
print(count)

但是，我收到此输出错误：

Warning message:
In if (snp1 == "B/A") count = count + 1 :
  the condition has length > 1 and only the first element will be used

Answer 1

也许这样的事情会起作用：

# Generate data
alleles <- c("B/B", "B/A", "A/A")
genotype <- matrix(sample(alleles, 20, replace = TRUE), 5)

#      [,1]  [,2]  [,3]  [,4] 
# [1,] "B/B" "A/A" "B/A" "A/A"
# [2,] "B/B" "A/A" "A/A" "B/B"
# [3,] "A/A" "A/A" "B/B" "B/A"
# [4,] "B/A" "B/B" "B/A" "A/A"
# [5,] "B/B" "B/A" "B/A" "B/A"

genotypeQuant <- matrix(as.numeric(factor(genotype, levels = alleles)) - 1,
                        nrow = nrow(genotype))

#      [,1] [,2] [,3] [,4]
# [1,]    0    2    1    2
# [2,]    0    2    2    0
# [3,]    2    2    0    1
# [4,]    1    0    1    2
# [5,]    0    1    1    1

首先将矩阵（等位基因/ SNP）转换为因子（向量alleles中提供的顺序），然后将这些因子转换为数字。

Answer 2

假设您的“A / A”，“B / A”和“B / B”的矢量被称为df，我只会使用：

df[df=="B/B"] <- 0
df[df=="B/A"] <- 1
df[df=="A/A"] <- 2
df <- as.numeric(df)

只要你真的只有这三个独特的值，这个解决方案就可以解决问题。

从矩阵中计算特定字符串的出现次数

2 个答案: