不同列

时间:2017-03-03 19:01:00

标签: r

我的数据框如下所示:

df<-data.frame(alphabets1=c("A","B","C","B","C"," ","NA"),alphabets2=c("B","A","D","D"," ","E","NA"),alphabets3=c("C","F","G"," "," "," ","NA"), number = c("1","2","3","1","4","1","2"))

  alphabets1 alphabets2 alphabets3 number
1          A          B          C      1
2          B          A          F      2
3          C          D          G      3
4          B          D                 1
5          C                            4
6                     E                 1
7         NA         NA         NA      2

NOTE1 :在行内所有值都是唯一的,也就是说,下面显示的是不可能的。

  alphabets1 alphabets2 alphabets3 number
1          A          A          C      1

NOTE2 :数据框可能包含NA或为空

我正在努力获得以下输出:这只是一个数据框,其中包含字母和相应数字的总和,即 A 字母表位于第1行和第2行所以它的总和它的相应数字是1 + 2,即3,让我们说 B ,它在第1,第2和第4行,所以总和将是1 + 2 + 1,即4。

output <-data.frame(alphabets1=c("A","B","C","D","E","F","G"), number = c("3","4","8","4","1","2","3")) 

output
   alphabets number
1          A      3
2          B      4
3          C      8
4          D      4
5          E      1
6          F      2
7          G      3

注意3:输出可能有也可能没有NA或空白(这没关系!)

2 个答案:

答案 0 :(得分:1)

我们可以将其重塑为'long'格式并按操作进行分组

library(data.table)
melt(setDT(df), id.var="number", na.rm = TRUE, value.name = "alphabets1")[
   !grepl("^\\s*$", alphabets1), .(number = sum(as.integer(as.character(number)))),
                 alphabets1]
#    alphabets1 number
#1:          A      3
#2:          B      4
#3:          C      8
#4:          D      4
#5:          E      1
#6:          F      2
#7:          G      3

或者我们可以使用xtabs

中的base R
xtabs(number~alphabets1, data.frame(alphabets1 = unlist(df[-4]),
              number = as.numeric(as.character(df[,4]))))

注意:在OP的数据集中,缺失值为"NA",而不是真实NA,而“数字”列为factor(通过转换为{{1}而更改用于执行integer

数据

sum

答案 1 :(得分:1)

以下是使用sapplytable的基本R方法。我首先将df$number转换为数字。请参阅下面的数据部分。

data.frame(table(sapply(df[-length(df)], function(i) rep(i, df$number))))
  Var1 Freq
1        11
2    A    3
3    B    4
4    C    8
5    D    4
6    E    1
7    F    2
8    G    3
9   NA    6

为了使输出更好一些,我们可以包含更多函数并在sapply内执行子集化。

data.frame(table(droplevels(unlist(sapply(df[-length(df)],
                                     function(i) rep(i[i %in% LETTERS],
                                                     df$number[i %in% LETTERS])),
                            use.names=FALSE))))
  Var1 Freq
1    A    3
2    B    4
3    C    8
4    D    4
5    E    1
6    F    2
7    G    3

不过,之后可能会更容易做到这一点。

数据

我跑了

df$number <- as.numeric(df$number)

关于OP的数据导致了这一点。

df <-
structure(list(alphabets1 = structure(c(2L, 3L, 4L, 3L, 4L, 1L, 
5L), .Label = c(" ", "A", "B", "C", "NA"), class = "factor"), 
    alphabets2 = structure(c(3L, 2L, 4L, 4L, 1L, 5L, 6L), .Label = c(" ", 
    "A", "B", "D", "E", "NA"), class = "factor"), alphabets3 = structure(c(2L, 
    3L, 4L, 1L, 1L, 1L, 5L), .Label = c(" ", "C", "F", "G", "NA"
    ), class = "factor"), number = c(1, 2, 3, 1, 4, 1, 2)), .Names = c("alphabets1", 
"alphabets2", "alphabets3", "number"), row.names = c(NA, -7L), class = "data.frame")