我的数据集的概率看起来像这样
topic_1 topic_2 topic_3 topic_4 topic_5 topic_6 most_probable
1 0.0028043479 0.0035351980 0.979083973 0.0045751502 0.0046371627 0.0053641679 topic_3
2 0.9688616242 0.0035351980 0.013026697 0.0045751502 0.0046371627 0.0053641679 topic_1
3 0.9928927297 0.0008069017 0.002973317 0.0010442686 0.0010584229 0.0012243603 topic_1
4 0.9841620200 0.0017981155 0.006625797 0.0023270686 0.0023586102 0.0027283884 topic_1
5 0.0004441958 0.0005599591 0.002063369 0.0007246827 0.9953581342 0.0008496595 topic_5
我使用此函数找到最可能的值
documents.topics$most_probable <- unlist(
lapply(
1:nrow(documents.topics),
function(x){
names(which.max(documents.topics[x,]))}))
documents.topics$most_probable <- as.factor(documents.topics$most_probable)
我想知道如何找到第二个最大值并将其名称粘贴到新列second_probable
答案 0 :(得分:1)
我们可以使用apply
,sort
和which
函数:
dat$second_most_probable <- apply(dat[,-7], 1,
FUN = function(x) which(x == sort(x, decreasing = TRUE)[2]))
对于每一行,我们按递减顺序sort
数据,并从向量中选择第二个元素。然后,我们发现which
列匹配第二大元素。我们使用which
的结果来确定列名。
dat <- structure(list(topic_1 = c(0.0028043479, 0.9688616242, 0.9928927297,
0.98416202, 0.0004441958), topic_2 = c(0.003535198, 0.003535198,
0.0008069017, 0.0017981155, 0.0005599591), topic_3 = c(0.979083973,
0.013026697, 0.002973317, 0.006625797, 0.002063369), topic_4 = c(0.0045751502,
0.0045751502, 0.0010442686, 0.0023270686, 0.0007246827), topic_5 = c(0.0046371627,
0.0046371627, 0.0010584229, 0.0023586102, 0.9953581342), topic_6 = c(0.0053641679,
0.0053641679, 0.0012243603, 0.0027283884, 0.0008496595), most_probable = c("topic_3",
"topic_1", "topic_1", "topic_1", "topic_5")), .Names = c("topic_1",
"topic_2", "topic_3", "topic_4", "topic_5", "topic_6", "most_probable"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5"
))
答案 1 :(得分:1)
以下是max.col
的另一个选项。我们将数据集中的值基于“most_probable”列名称提取到数据集副本中的-Inf
。然后使用max.col
获取最大值列的索引,并使用它来获取列名
dat1 <- dat
dat1[cbind(1:nrow(dat), match( dat$most_probable, names(dat)))] <- -Inf
dat$second_most_probable <- names(dat)[max.col(dat1[-7])]
dat$second_most_probable
#[1] "topic_6" "topic_3" "topic_3" "topic_3" "topic_3"