Question

我在r中有一个数据帧，其中包含一些NA值。如何使用pmmlTransformations为这些字段设置缺失值处理。我已经看到你可以在转换数据时进行missingValue处理（规范化，字段映射等），但我想知道如何设置缺失值而不必对数据进行规范化。

    library(pmml)
    library(pmmlTransformations)

    df <- data.frame(id=1:5, y=1:5, x=c(2,4,3,NA,8))
    dataBox <- WrapData(df)

    # update the wrapped data to set x=1 when it its NA

    fit <- glm(formula=y~x, data = dataBox$data)

    pmml(fit, transforms=dataBox)

非常感谢提前

安德鲁

Answer 1

如果您只想将missingValueReplacement=1属性添加到PMML文档中的所有MiningField元素，请将unknownValue = 1附加到pmml::pmml.glm函数调用中：

library(pmml)
df <- data.frame(id=1:5, y=1:5, x=c(2,4,3,NA,8))
# Set missing values to 1 before training a GLM model
df$x[is.na(df$x)] = 1
fit <- glm(formula=y~x, data = df)
# Encode information about the missing value transformation into the PMML document
pmml = pmml.glm(fit, unknownValue = 1)
saveXML(pmml, "glm.pmml")

当然，unknownValue参数似乎已被弃用，但它完全符合您的要求而不会触发复杂的转换序列。

Answer 2

您可以使用unknownValue参数： pmml.glm(glm, transforms = dataBox, unknownValue = 0) 但这将适用于所有变量，包括目标变量。

我编写了一个修复程序，允许为每个变量指定替换值： https://github.com/guleatoma/pmml

使用此版本的软件包即可：

pmml.glm(glm, transforms = dataBox, unknownValue = list("x1" = 0, "x2" = 100))

如何使用r中的PmmlTransformation设置缺失值

2 个答案: