Question

我试图计算每列中特定值的频率。

基本上，我正在研究不同的细菌分离物（每行代表）对不同抗生素治疗的反应（代表每一栏）。＆＃34; 1＆＃34;意味着分离物对抗生素有抗性，而＆＃34; 0＆＃34;表示分离株对抗生素敏感。

antibiotic1 <- c(1, 1, 0, 1, 0, 1, NA, 0, 1)
antibiotic2 <- c(0, 0, NA, 0, 1, 1, 0, 0, 0)
antibiotic3 <- c(0, 1, 1, 0, 0, NA, 1, 0, 0)

ab <- data.frame(antibiotic1, antibiotic2, antibiotic3)

ab
       antibiotic1 antibiotic2 antibiotic3
1           1           0           0
2           1           0           1
3           0          NA           1
4           1           0           0
5           0           1           0
6           1           1          NA
7          NA           0           1
8           0           0           0
9           1           0           0

所以看第一行，分离株1对抗生素1有抗药性，对抗生素2敏感，对抗生素3敏感。

我想计算每种抗生素耐药的分离株的百分比。即将每列中＆＃34; 1＆＃34;的数量相加，除以每列中的隔离数（不包括我的分母中的NA）。

我知道如何获得计数：

apply(ab, 2, count)

$antibiotic1
   x   freq
1  0    3
2  1    5
3 NA    1

$antibiotic2
   x freq
1  0    6
2  1    2
3 NA    1

$antibiotic3
   x freq
1  0    5
2  1    3
3 NA    1

但是我的实际数据集包含许多不同的抗生素和数百种分离物，所以我希望能够同时在所有列上运行一个函数来生成数据帧。

我已经尝试了

counts <- ldply(ab, function(x) sum(x=="1")/(sum(x=="1") +  sum(x=="0")))

但是会产生NAs：

          .id V1
1 antibiotic1 NA
2 antibiotic2 NA
3 antibiotic3 NA

我也尝试过：

library(dplyr)
ab %>%
 summarise_each(n = n())) %>%
 mutate(prop.resis = n/sum(n))

但是会收到一条错误消息：

Error in n() : This function should not be called directly

非常感谢任何建议。

Answer 1

我只想使用PreferenceActivity

对其进行矢量化

colMeans

作为旁注，这可以很容易地推广到计算任何数字的频率。例如，如果您要查找所有列中数字colMeans(ab, na.rm = TRUE) # antibiotic1 antibiotic2 antibiotic3 # 0.625 0.250 0.375的频率，则只需修改为2

或者类似地，只是（这避免了通过列评估进行权衡的矩阵转换）

colMeans(ab == 2, na.rm = TRUE)

Answer 2

问题的另一个答案，这就是你想要的吗？

antibiotic1 <- c(1, 1, 0, 1, 0, 1, NA, 0, 1)
antibiotic2 <- c(0, 0, NA, 0, 1, 1, 0, 0, 0)
antibiotic3 <- c(0, 1, 1, 0, 0, NA, 1, 0, 0)

ab <- data.frame(antibiotic1, antibiotic2, antibiotic3)


result <- vector()
for (i in 1:dim(ab)[2]) {
    print(sum(ab[i],na.rm = TRUE)/dim(na.omit(ab[i]))[1])        
    result <- c(result,sum(ab[i],na.rm = TRUE)/dim(na.omit(ab[i]))[1])
}

result

0.625 0.250 0.375

Answer 3

以下是一种方法：

antibiotic1 antibiotic2 antibiotic3
1           0           0
1           0           1
0          NA           1
1           0           0
0           1           0
1           1          NA
NA          0           1
0           0           0
1           0           0

dat <- read.table(file="clipboard",header=T)
sapply(dat, function(x) prop.table(table(x,useNA = "no"))[[2]])

antibiotic1 antibiotic2 antibiotic3 
      0.625       0.250       0.375

Answer 4

更简单地说，使用基数R，你可以做到

apply(sapply(ab, table), 2, prop.table)

这为您提供了除1以外的每种抗生素0和NA的比例

#   antibiotic1 antibiotic2 antibiotic3
# 0       0.375        0.75       0.625
# 1       0.625        0.25       0.375

如果您只对1的比例感兴趣，请选择第二行，方法是在行的末尾添加[2, ]。

计算每列中出现的频率

4 个答案: