说我有这种格式的数据:
playerID sp rp c 1b 2b 3b ss of dh primary
1 adamja01 0 31 0 4 0 0 0 0 0 RP
2 adamsau02 0 2 0 0 0 1 3 0 0 RP
3 adamsch01 1 2 0 6 0 0 0 0 0 RP
4 alberma01 0 34 0 0 0 0 0 0 0 RP
5 alcansa01 6 0 0 0 0 0 0 0 0 SP
6 alcanvi01 0 27 0 0 0 0 0 0 0 RP
我需要创建一个新的字符串列。该字符串列出了给定行的值超过某个阈值的所有列名称。
假设阈值存储在向量posThresh
中,我几乎可以使用以下内容:
positions$altPos <- apply(positions[, 2:10], 1, function(x)
toString(names(positions)[2:10][x >= posThresh]))
此函数添加此列:
playerID sp rp c 1b 2b 3b ss of dh primary altPos
1 adamja01 0 31 0 0 0 0 0 0 0 RP RP
2 adamsau02 0 2 0 0 0 0 0 0 0 RP RP
3 adamsch01 1 2 0 0 0 0 0 0 0 RP SP,RP
4 alberma01 0 34 0 0 0 0 0 0 0 RP RP
5 alcansa01 6 0 0 0 0 0 0 0 0 SP SP
6 alcanvi01 0 27 0 0 0 0 0 0 0 RP RP
在第3行的情况下,primary
下的值现在在altPos
下重复。但是,RP
下的altPos
值不是来自primary
,而是列名rp
。有什么办法可以生成相同的信息,但要从字符串中排除等于primary
的值的任何值?
基本上,超过阈值,而不是任何列也等于primary
...我只是不能得到格式化向下:>= threshold && <> primary
答案 0 :(得分:1)
您可以像这样调整您的功能。
posThresh <- 1
positions$altPos <-
apply(positions, 1,
function(x) {
raw <- names(positions)[2:10][x[2:10] >= posThresh]
excl <- tolower(as.character(x[grep("primary", names(positions))]))
cln <- toString(raw[raw != excl])
if (cln == "") return(NA)
else return(cln)
})
产量
> positions
playerID sp rp c X1b X2b X3b ss of dh primary altPos
1 adamja01 0 31 0 4 0 0 0 0 0 RP X1b
2 adamsau02 0 2 0 0 0 1 3 0 0 RP X3b, ss
3 adamsch01 1 2 0 6 0 0 0 0 0 RP sp, X1b
4 alberma01 0 34 0 0 0 0 0 0 0 RP <NA>
5 alcansa01 6 0 0 0 0 0 0 0 0 SP <NA>
6 alcanvi01 0 27 0 0 0 0 0 0 0 RP <NA>
上面的函数已经可以使用posThresh
向量。以下是也可用于矩阵的函数。最好包含一些我已经完成的异常处理。
validThresh <- function(positions, posThresh) {
stopifnot(all(!is.na(posThresh)))
if(!length(posThresh) == 1 & !is.matrix(posThresh) &
!length(posThresh) == dim(positions[2:10])[2])
stop("length of posThresh do not equal number of test columns!")
if(!all(is.matrix(posThresh) & dim(posThresh) == dim(positions[2:10])))
stop("posThresh and test matrix do not have the same dimensions!")
mx <- positions[2:10] >= posThresh
raw <- apply(mx, 1, function(mx) names(mx[mx == TRUE]))
excl <- tolower(unlist(positions[grep("primary", names(positions))]))
cln <- sapply(1:length(raw), function(i)
toString(raw[[i]][raw[[i]] != excl[i]]))
return(ifelse(cln == "", NA, cln))
}
用法:
validThresh(positions, posThresh)
一些测试:
validThresh(positions, posThresh=1)
validThresh(positions, posThresh=NA) # error
validThresh(positions, posThresh=c(6, 27, 1, 5, 1, 1, 3, 0, 1))
validThresh(positions, posThresh=c(1, 2, 2)) # error
validThresh(positions, posThresh=matrix(1, 6, 9))
validThresh(positions, posThresh=matrix(1, 7, 9)) # error
最后像这样添加您的列:
positions$altPos <- validThresh(positions, posThresh)
positions <- structure(list(playerID = structure(1:6, .Label = c("adamja01",
"adamsau02", "adamsch01", "alberma01", "alcansa01", "alcanvi01"
), class = "factor"), sp = c(0L, 0L, 1L, 0L, 6L, 0L), rp = c(31L,
2L, 2L, 34L, 0L, 27L), c = c(0L, 0L, 0L, 0L, 0L, 0L), X1b = c(4L,
0L, 6L, 0L, 0L, 0L), X2b = c(0L, 0L, 0L, 0L, 0L, 0L), X3b = c(0L,
1L, 0L, 0L, 0L, 0L), ss = c(0L, 3L, 0L, 0L, 0L, 0L), of = c(0L,
0L, 0L, 0L, 0L, 0L), dh = c(0L, 0L, 0L, 0L, 0L, 0L), primary = structure(c(1L,
1L, 1L, 1L, 2L, 1L), .Label = c("RP", "SP"), class = "factor")), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
答案 1 :(得分:0)
如果您可以使用dplyr
和tidyr
,那么
library(dplyr)
library(tidyr)
df %>%
gather(k, v, -playerID, -primary) %>%
filter(v > 0, tolower(primary) != k) %>%
group_by(playerID) %>%
summarize(k = paste(k, collapse = ","))
# # A tibble: 6 x 2
# playerID k
# <fct> <chr>
# 1 adamja01 X1b
# 2 adamsau02 X3b,ss
# 3 adamsch01 sp,X1b
# 4 alberma01 ""
# 5 alcansa01 ""
# 6 alcanvi01 ""
从这里,您可以merge
或left_join
将其与原始框架一起退回。