sqldf在按其排序时将数字列更改为字符1

时间:2015-09-24 12:54:15

标签: r type-conversion sqldf

今天我发现了一个我无法解释的问题。这是一个众所周知的行为吗?

数据集:

structure(list(Original.Unit = c("some unit", "some unit", "some unit", 
"some unit", "some unit", "some unit"), Result = c(24, 28, NA, 
4.1, 4.5, 2.6), Conversion.Factor = c(1, 1.54, 1, 2.2, 1, 1)), .Names = c("Original.Unit", 
"Result", "Conversion.Factor"), row.names = c(NA, 6L), class = "data.frame")

代码:

> require(sqldf)

> (data <- dget("file"))   # "file" contains the above structure
  Original.Unit Result Conversion.Factor
1     some unit   24.0              1.00
2     some unit   28.0              1.54
3     some unit     NA              1.00
4     some unit    4.1              2.20
5     some unit    4.5              1.00
6     some unit    2.6              1.00

> sapply(data, function(d) { class(d)})
    Original.Unit            Result Conversion.Factor 
      "character"         "numeric"         "numeric" 

让我们这样查询:

> (result <- sqldf("SELECT `Original.Unit`, Result, `Conversion.Factor`, Result * `Conversion.Factor` AS ConvResult FROM data"))
  Original.Unit Result Conversion.Factor ConvResult
1     some unit   24.0              1.00      24.00
2     some unit   28.0              1.54      43.12
3     some unit     NA              1.00         NA
4     some unit    4.1              2.20       9.02
5     some unit    4.5              1.00       4.50
6     some unit    2.6              1.00       2.60

> sapply(result, function(r) { class(r)})
    Original.Unit            Result Conversion.Factor        ConvResult 
      "character"         "numeric"         "numeric"         "numeric" 

至于好。现在让我们按最后一列对结果进行排序:

> (result <- sqldf("SELECT `Original.Unit`, Result, `Conversion.Factor`, Result * `Conversion.Factor` AS ConvResult FROM data ORDER BY ConvResult"))
  Original.Unit Result Conversion.Factor ConvResult
1     some unit     NA              1.00       <NA>
2     some unit    2.6              1.00        2.6
3     some unit    4.5              1.00        4.5
4     some unit    4.1              2.20       9.02
5     some unit   24.0              1.00       24.0
6     some unit   28.0              1.54      43.12

查看列类型:

> sapply(result, function(r) { class(r)})
    Original.Unit            Result Conversion.Factor        ConvResult 
      "character"         "numeric"         "numeric"       "character" 

为什么ConvResult列现在是字符类型?这是因为NA?

似乎是重点。当我用1000替换NA时,ConvResult变成了数字。但为什么会这样呢?

1 个答案:

答案 0 :(得分:3)

是的,看来你是对的。当使用ORDER BY sqldf倾向于猜测输出列的类时,在这种情况下它会出错。所以,我想你可以自己设置列类型,以确保:

  result <- sqldf("SELECT `Original.Unit`, Result, `Conversion.Factor`, Result * `Conversion.Factor` AS ConvResult FROM data ORDER BY ConvResult",method = c("character", "numeric", "numeric", "numeric"))