Question

我正在尝试使用大型的表达式数据集（沿列的所有分类变量）来找到一组很好的分类变量，以预测二进制结果。在几个而非全部时间点（研究中的T1-T7）对每个受试者进行测量。每个主题都有一个特定的ID。为此，我决定使用MXM::MMPC.timeclass()。但是，它会产生负p值。据我了解，p值……根据定义，概率不可能为负。他们真的做不到，这很明显。

我尝试过MMPC.timeclass()，并进行了广泛的文献搜索，以找到另一种可能合适的方法，但到目前为止还没有任何结果。

set.seed(5)
## assume these are longitudinal data, each column is a variable (or feature)
dataset <- matrix( rnorm(400 * 100), ncol = 100 ) 
id <- rep(1:80, each = 5)  ## 80 subjects
reps <- rep( seq(4, 12, by = 2), 80)

## 5 time points for each subject
## dataset contains are the regression coefficients of each subject's values on the 
## reps (which is assumed to be time in this example)
target <- rep(0:1, each = 200)
a <- MMPC.timeclass(target, reps, id, dataset)
a@pvalues %>% summary()

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-4.01762 -1.39835 -0.68720 -0.98512 -0.37326 -0.01365

预期结果应包括p值（在0-1范围内），甚至更好，包括筛选过程中每个变量的某种类型的排名。我以前使用过VariableScreening::ScreenLD()，但这是绝对的结果，因此不适用于数据。

Answer 1

答案是它们是对数p值。文档将相应地更新。请参阅https://github.com/mensxmachina/MXM-R-Package/issues/2，以获取软件包作者的回复。

可变筛查连续结果，分类预测变量，负p值

1 个答案: