如何在我的载体中摆脱NA

时间:2015-06-24 20:24:54

标签: r

我有一个矢量如下:

var connectionString = Configuration.ConfigurationManager.ConnectionStrings["nameOfConnectionString"].ConnectionString;
var sqlConnection = new SqlConnection(connectionString);

我想摆脱NAs,所以我尝试了> dput(v) structure(c("1", "2", "2", "2", "2", "1", "2", "2", "1", "2", "2", "1", "1", "2", "2", "2", "1", "2", "2", "2", "2", "1", "2", "1", "1", "2", "1", "1", "1", "1", "1", "2", "2", "1", "2", "2", "2", "2", "2", "2", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "2", "1", "2", "2", "1", "2", "2", "1", "1", "1", "1", "2", "2", "1", "1", "1", "1", "1", "2", "2", "1", "2", "2", "1", "1", "2", "2", "2", "1", "1", "2", "2", "2", "1", "1", "2", "1", "2", "1", "2", "1", "2", "1", "1", "1", "2", "1", "2", "1", "2", "2", "2", "1", "2", "2", "1", "1", "2", "2", "1", "1", "2", "1", "2", "1", "2", "2", "1", "2", "1", "1", "2", "2", "1", "2", "2", "2", "2", "2", "2", "2", "1", "2", "2", "1"), .Label = logical(0)) ,但这并不起作用。我认为NA不是NA类型,而是字面上的字符串" NA"所以我尝试使用以下

将它们转换为NA类型
na.omit

其中没有按照我想要的方式工作。

*编辑

v[] <- lapply(v, function(x) {
    is.na(levels(x)) <- levels(x) == "NA"
    x
})

鉴于此类data.frame,我想删除其中包含NA的任何行。我已尝试> dput(data) structure(list(w = c(2, 1, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2), x = c("1", "2", "2", "2", "2", "2", "2", "1", "1", "1", "2", "1", "1", "1", "2", "1", "1", "2", "2", "2", "2", "1", "1", "1", "1", "2", "2", "1", "1", "2", "1", "2", "2", "1", "2", "1", "2", "2", "1", "1", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "1", "1", "2", "2", "1", "1", "2", "1", "2", "1", "1", "2", "2", "1", "1", "1", "1", "1", "2", "2", "1", "2", "2", "2", "2", "2", "1", "2", "1", "1", "2", "2", "2", "1", "1", "1", "1", "2", "1", "1", "1", "2", "2", "2", "1", "2", "1", "2", "1", "2", "2", "2", "1", "2", "2", "1", "1", "2", "2", "1", "2", "2", "1", "1", "2", "2", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "1", "2", "1"), y = c("1", "2", "2", "2", "2", "1", "2", "2", "1", "2", "2", "1", "1", "2", "2", "2", "1", "2", "2", "2", "2", "1", "2", "1", "1", "2", "1", "1", "1", "1", "1", "2", "2", "1", "2", "2", "2", "2", "2", "2", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "2", "1", "2", "2", "1", "2", "2", "1", "1", "1", "1", "2", "2", "1", "1", "1", "1", "1", "2", "2", "1", "2", "2", "1", "1", "2", "2", "2", "1", "1", "2", "2", "2", "1", "1", "2", "1", "2", "1", "2", "1", "2", "1", "1", "1", "2", "1", "2", "1", "2", "2", "2", "1", "2", "2", "1", "1", "2", "2", "1", "1", "2", "1", "2", "1", "2", "2", "1", "2", "1", "1", "2", "2", "1", "2", "2", "2", "2", "2", "2", "2", "1", "2", "2", "1"), z = structure(c(2L, 1L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1", "2"), class = "factor")), .Names = c("w", "x", "y", "z"), row.names = c(11L, 12L, 14L, 16L, 19L, 20L, 24L, 29L, 30L, 34L, 36L, 38L, 41L, 42L, 44L, 63L, 66L, 69L, 74L, 76L, 78L, 80L, 81L, 91L, 93L, 96L, 97L, 98L, 103L, 104L, 106L, 109L, 117L, 118L, 120L, 124L, 125L, 126L, 129L, 133L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 13L, 15L, 17L, 18L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 31L, 32L, 33L, 35L, 37L, 39L, 40L, 43L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 64L, 65L, 67L, 68L, 70L, 71L, 72L, 73L, 75L, 77L, 79L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 92L, 94L, 95L, 99L, 100L, 101L, 102L, 105L, 107L, 108L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 119L, 121L, 122L, 123L, 127L, 128L, 130L, 131L, 132L, 134L), class = "data.frame") ldForImp,但由于&#34; NA&#34;不是NA类型,因此无效。

2 个答案:

答案 0 :(得分:2)

请参阅本文末尾的注释。

原始对象/向量

你的对象有点奇怪;它是一个字符向量,但它有一个属性"levels",它是一个零长度的逻辑向量。

无论如何,你想在这里寻找字符串"NA",因为它们是文字"NA"字符串而不是NA s。

xx[xx != "NA"]

> xx[xx != "NA"]
  [1] "1" "2" "2" "2" "2" "1" "2" "2" "1" "2" "2" "1" "1" "2" "2" "2" "1" "2"
 [19] "2" "2" "2" "1" "2" "1" "1" "2" "1" "1" "1" "1" "1" "2" "2" "1" "2" "2"
 [37] "2" "2" "2" "2" "2" "1" "2" "2" "1" "2" "2" "1" "1" "1" "1" "2" "2" "1"
 [55] "1" "1" "1" "1" "2" "2" "1" "2" "2" "1" "1" "2" "2" "2" "1" "1" "2" "2"
 [73] "2" "1" "1" "2" "1" "2" "1" "2" "1" "2" "1" "1" "1" "2" "1" "2" "1" "2"
 [91] "2" "2" "1" "2" "2" "1" "1" "2" "2" "1" "1" "2" "1" "2" "1" "2" "2" "1"
[109] "2" "1" "1" "2" "2" "1" "2" "2" "2" "2" "2" "2" "2" "1" "2" "2" "1"

(其中xx是您发布的对象。)

数据框示例

假设您的数据框现在在xxx,请先找到"NA"的元素:

xxx!=“NA”

然后计算行总和,在执行此操作时注意TRUE == 1FALSE == 2,并查找小于ncol(xxx)(即4)TRUE值的行。

ind <- rowSums(xxx != "NA") < ncol(xxx)

(@ DavidArenburg建议替代rowSums(xxx == "NA") > 0,它比上面的版本更简洁,当然比我原来更简洁。)

这表示至少有一个"NA" 字符串

的行

然后使用ind取消选择xxx

的行
XXX <- xxx[!ind, ]

> XXX <- xxx[!ind, ]
> nrow(xxx)
[1] 134
> nrow(XXX)
[1] 125

注意:

我会像xx一样添加xxx(您的数据框)也有点奇怪:

> str(xxx)
'data.frame':   134 obs. of  4 variables:
 $ w: num  2 1 1 1 1 2 1 2 2 1 ...
 $ x: chr  "1" "2" "2" "2" ...
 $ y: chr  "1" "2" "2" "2" ...
 $ z: Factor w/ 3 levels "0","1","2": 2 1 3 2 2 1 1 1 1 1 ..

似乎你把三种不同类型的对象组合在一起,显然是值0,1,2,但它们实际上是微妙不同的对象。您似乎还有"NA"个字符串,您可能需要NA个字符串。我会研究为什么以及如何结束这样的数据框架。

答案 1 :(得分:2)

您在编辑时略微移动了球门柱,但是:

git rebase -i parentBranch

注释:

  • 在某些(罕见)情况下,调用您的数据anyCharNA <- apply(dd,1,function(x) any(x=="NA")) dim(dd) ## [1] 134 4 dim(dd[!anyCharNA,]) ## [1] 125 4 是危险/混乱的,这也是内置函数的名称。 R通常可以区分,但并不总是......
  • 您可能希望返回并更改工作流程,以便您的数据不会如此奇怪,您只需使用data ...

如果您想要清理数据 - 假设您确实希望所有内容都是整数 -

na.omit()

(额外的复杂性是确保因子正确转换回整数所必需的)