假设我有一个data.table
library(data.table)
dt <- data.table(term = c('dog', 'cat', 'fish', 'dog', 'cat', 'fish',
'dog', 'cat', 'fish', 'dog', 'cat', 'fish',
'dog', 'cat', 'fish', 'dog', 'cat', 'fish'),
eats = c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1, 2, 3, 3, 3, 3, 3, 3),
weights = c(6, 5, 4, 3, 2, 1, 1, 2, 3, 4, 5, 6, 2, 2, 2, 2, 2, 2))
然后我创建一个函数来执行它们吃什么和它们的重量之间的相关性并返回给定宠物的结果:
foo <- function(pet, dtSrc){
newDt <- dtSrc[term == pet, c('eats', 'weights')]
corTotal <- Hmisc::rcorr(as.matrix(newDt), type = 'pearson')
corValues <- corTotal$r[1, 2]
return(corValues)
}
我可以通过foo函数运行它来获得每只宠物的食物和体重之间的相关性。使用sapply,我可以做这样的事情:
pets <- unique(dt$term)
dtResult <- sapply(pets, foo, dtSrc = dt)
dtResult <- as.data.table(dtResult, keep.rownames = TRUE)
colnames(dtResult) <- c('pet', 'cor')
结果很完美。我为每只宠物准备了一行
pet cor
1: dog -0.8696263
2: cat -0.8215838
3: fish -0.7364854
但是,如果我还想将p值数据添加到每一行,那么我可以得到如下结果:
pet cor pv
1: dog -0.8696263 0.02438794
2: cat -0.8215838 0.04490880
3: fish -0.7364854 0.09501072
我以为我可以将p值添加到相关性中,可能是这样的:
fooMore <- function(pet, dtSrc){
newDt <- dtSrc[term == pet, c('eats', 'weights')]
corTotal <- Hmisc::rcorr(as.matrix(newDt), type = 'pearson')
corValues <- corTotal$r[1, 2]
pValues <- corTotal$P[1, 2]
result <- c(corValues, pValues)
return(result)
}
pets <- unique(dt$term)
dtResult <- sapply(pets, fooMore, dtSrc = dt)
dtResult <- as.data.table(dtResult, keep.rownames = TRUE)
colnames(dtResult) <- c('pet', 'cor', 'pv')
不幸的是,结果看起来并不像以前那样。特别是,我没有得到我需要的rownames:
pet cor pv
[1,] -0.86962634 -0.8215838 -0.73648536
[2,] 0.02438794 0.0449088 0.09501072
修改上面的代码以生成我正在寻找的结果的最R-ish方法是什么? TIA
答案 0 :(得分:1)
因为我所能得到的只是一个downvote(dang!那是苛刻的)我会发布解决方法,虽然我会欢迎更好的解决方案。如您所见,我只是组合这些值并稍后将它们分开。丑陋,但至少我不必两次运行相同的操作。
fooMore <- function(pet, dtSrc){
newDt <- dtSrc[term == pet, c('eats', 'weights')]
corTotal <- Hmisc::rcorr(as.matrix(newDt), type = 'pearson')
corValues <- corTotal$r[1, 2]
pValues <- corTotal$P[1, 2]
resultBoth <- paste0(corValues, ':', pValues) # combine results
return(resultBoth)
}
pets <- unique(dt$term)
dtResult <- sapply(pets, fooMore, dtSrc = dt)
dtResult <- as.data.table(dtResult, keep.rownames = TRUE)
dtResult[, c('corValue', 'pValue') := tstrsplit(dtResult, ":", fixed=TRUE)] # split them back out
dtResult$corValue <- as.numeric(dtResult$corValue)
dtResult$pValue <- as.numeric(dtResult$pValue)
dtResult$dtResult <- NULL
# just to be consistent with earlier
colnames(dtResult) <- c('pet', 'cor', 'pv')
pet cor pv
1: dog -0.8696263 0.02438794
2: cat -0.8215838 0.04490880
3: fish -0.7364854 0.09501072