我的数据框如下所示:
df<-data.frame(alphabets1=c("A","B","C","B","C"," ","NA"),alphabets2=c("B","A","D","D"," ","E","NA"),alphabets3=c("C","F","G"," "," "," ","NA"), number = c("1","2","3","1","4","1","2"))
alphabets1 alphabets2 alphabets3 number
1 A B C 1
2 B A F 2
3 C D G 3
4 B D 1
5 C 4
6 E 1
7 NA NA NA 2
NOTE1 :在行内所有值都是唯一的,也就是说,下面显示的是不可能的。
alphabets1 alphabets2 alphabets3 number
1 A A C 1
NOTE2 :数据框可能包含NA或为空
我正在努力获得以下输出:这只是一个数据框,其中包含字母和相应数字的总和,即 A 字母表位于第1行和第2行所以它的总和它的相应数字是1 + 2,即3,让我们说 B ,它在第1,第2和第4行,所以总和将是1 + 2 + 1,即4。
output <-data.frame(alphabets1=c("A","B","C","D","E","F","G"), number = c("3","4","8","4","1","2","3"))
output
alphabets number
1 A 3
2 B 4
3 C 8
4 D 4
5 E 1
6 F 2
7 G 3
注意3:输出可能有也可能没有NA或空白(这没关系!)
答案 0 :(得分:1)
我们可以将其重塑为'long'格式并按操作进行分组
library(data.table)
melt(setDT(df), id.var="number", na.rm = TRUE, value.name = "alphabets1")[
!grepl("^\\s*$", alphabets1), .(number = sum(as.integer(as.character(number)))),
alphabets1]
# alphabets1 number
#1: A 3
#2: B 4
#3: C 8
#4: D 4
#5: E 1
#6: F 2
#7: G 3
或者我们可以使用xtabs
base R
xtabs(number~alphabets1, data.frame(alphabets1 = unlist(df[-4]),
number = as.numeric(as.character(df[,4]))))
注意:在OP的数据集中,缺失值为"NA"
,而不是真实NA
,而“数字”列为factor
(通过转换为{{1}而更改用于执行integer
)
sum
答案 1 :(得分:1)
以下是使用sapply
和table
的基本R方法。我首先将df$number
转换为数字。请参阅下面的数据部分。
data.frame(table(sapply(df[-length(df)], function(i) rep(i, df$number))))
Var1 Freq
1 11
2 A 3
3 B 4
4 C 8
5 D 4
6 E 1
7 F 2
8 G 3
9 NA 6
为了使输出更好一些,我们可以包含更多函数并在sapply
内执行子集化。
data.frame(table(droplevels(unlist(sapply(df[-length(df)],
function(i) rep(i[i %in% LETTERS],
df$number[i %in% LETTERS])),
use.names=FALSE))))
Var1 Freq
1 A 3
2 B 4
3 C 8
4 D 4
5 E 1
6 F 2
7 G 3
不过,之后可能会更容易做到这一点。
数据强>
我跑了
df$number <- as.numeric(df$number)
关于OP的数据导致了这一点。
df <-
structure(list(alphabets1 = structure(c(2L, 3L, 4L, 3L, 4L, 1L,
5L), .Label = c(" ", "A", "B", "C", "NA"), class = "factor"),
alphabets2 = structure(c(3L, 2L, 4L, 4L, 1L, 5L, 6L), .Label = c(" ",
"A", "B", "D", "E", "NA"), class = "factor"), alphabets3 = structure(c(2L,
3L, 4L, 1L, 1L, 1L, 5L), .Label = c(" ", "C", "F", "G", "NA"
), class = "factor"), number = c(1, 2, 3, 1, 4, 1, 2)), .Names = c("alphabets1",
"alphabets2", "alphabets3", "number"), row.names = c(NA, -7L), class = "data.frame")