如何获取大数据帧的每一行中特定元素的数量?

时间:2018-04-03 23:58:16

标签: r

我有一个由data.frame个对象和239变量组成的大546639个。 data.frame的元素包括AB0。现在我想知道每一行中每个元素的数量。以下是data.frame

的一部分
1 rs22233… B     B     B     B     B     B     B     B     B     B     B    
2 rs38622… B     B     B     B     B     B     B     B     A     B     A    
3 rs13933… B     B     A     B     B     B     B     B     B     B     B    
4 rs38637… B     B     A     A     A     B     B     B     A     B     A    
5 rs12554… B     B     B     B     A     B     A     B     B     B     B    
6 rs41105… A     A     A     A     B     A     B     A     A     A     B   

3 个答案:

答案 0 :(得分:2)

我们可以使用apply使用table按行计算:

apply(df[-c(1,2)],1,table)
# [[1]]
# 
# B 
# 11 
# 
# [[2]]
# 
# A B 
# 2 9 
# 
# [[3]]
# 
# A  B 
# 1 10 
# 
# [[4]]
# 
# A B 
# 5 6 
# 
# [[5]]
# 
# A B 
# 2 9 
# 
# [[6]]
# 
# A B 
# 8 3 

答案 1 :(得分:2)

方法1:

使用<!DOCTYPE html> <html> <head> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/typeit/5.0.2/typeit.min.js"></script> <script> $(function () { new TypeIt('#element', { speed: 45 }) .type('The programers') .pause(300) .options({ speed: 200 }) .delete(3) .options({ speed: 45 }) .pause(300) .type('mer\'s wife sent him to teh sto.') .pause(500) .options({ speed: 200 }) .delete(7) .type('he store.') .pause(500) .break() .options({ speed: 45 }) .type('Her instructions were <em>"Buy butter. See if they have 10 eggs. If they do, buy ten.</em>"') .pause(1000) .break() .type('He came back with ten packs of butter. ') .pause(1000) .type('Because they have eggs.'); }); </script> </head> <body> <h1 id="element"></h1> </body> </html> (感谢@thelatemail):

table

或(慢):

table(factor(unlist(df[-1]), levels = c("A", "B", "0")), row(df[-1]))        
#    1  2  3  4  5  6
# A  0  2  1  5  2  8
# B 11  9 10  6  9  3
# 0  0  0  0  0  0  0

说明:sapply(split(df, 1:nrow(df)), function(x) table(factor(unlist(x[, -1]), levels = c("A", "B", "0")))) # 1 2 3 4 5 6 #A 0 2 1 5 2 8 #B 11 9 10 6 9 3 #0 0 0 0 0 0 0 确保factor(..., levels = c("A", "B", "0"))始终报告相同的三个table级别的计数,然后您可以将其存储在factor中。

方法2:

使用matrix

rle

方法3:

使用lapply(split(df, 1:nrow(df)), function(x) as.data.frame(unclass(rle(as.character(sort(unlist(x[, -1]))))))) #$`1` # lengths values #1 11 B # #$`2` # lengths values #1 2 A #2 9 B # #$`3` # lengths values #1 1 A #2 10 B # #$`4` # lengths values #1 5 A #2 6 B # #$`5` # lengths values #1 2 A #2 9 B # #$`6` # lengths values #1 8 A #2 3 B tidyr::gather

dplyr::count

样本数据

library(tidyverse);
df %>%
    gather(key, val, -V2) %>%
    count(V2, val)
## A tibble: 11 x 3
#V2       val       n
#<fct>    <chr> <int>
#1 rs12554… A         2
#2 rs12554… B         9
#3 rs13933… A         1
#4 rs13933… B        10
#5 rs22233… B        11
#6 rs38622… A         2
#7 rs38622… B         9
#8 rs38637… A         5
#9 rs38637… B         6
#10 rs41105… A         8
#11 rs41105… B         3

答案 2 :(得分:1)

使用dplyrtidyr

library(dplyr)
library(tidyr)

df %>% 
  gather(key, value, V3:V13) %>% 
  group_by(V2) %>% 
  count(value) %>% 
  spread(value, n)

# A tibble: 6 x 3
# Groups:   V2 [6]
  V2           A     B
  <fct>    <int> <int>
1 rs12554…     2     9
2 rs13933…     1    10
3 rs22233…    NA    11
4 rs38622…     2     9
5 rs38637…     5     6
6 rs41105…     8     3