如何删除R中数据框列中字符串中的所有NA?

时间:2014-02-28 18:10:56

标签: r csv

我有一个像

这样的CSV文件
LocationList,Identity,Category
"New York,New York,United States","42","S"
"NA,California,United States","89","lyt"
"Hartford,Connecticut,United States","879","polo"
"San Diego,California,United States","45454","utyr"
"Seattle,Washington,United States","uytr","69"
"NA,NA,United States","87","tree"

我想从'LocationList'列中删除所有'NA'

期望的结果 -

 LocationList,Identity,Category
"New York,New York,United States","42","S"
"California,United States","89","lyt"
"Hartford,Connecticut,United States","879","polo"
"San Diego,California,United States","45454","utyr"
"Seattle,Washington,United States","uytr","69"
"United States","87","tree"

列数不固定,可能会增加或减少。此外,我想写入没有引号的CSV文件,也没有转义为“LocationList”列。

如何在R中实现以下功能? R的新手任何帮助表示赞赏。

2 个答案:

答案 0 :(得分:2)

在这种情况下,您只想更换NA,。但是,这不是删除NA值的标准方法。

假设dat是您的数据,请使用

dat$LocationList <- gsub("^(NA,)+", "", dat$LocationList)

答案 1 :(得分:1)

尝试:

my.data <- read.table(text='LocationList,Identity,Category
                      "New York,New York,United States","42","S"
                      "NA,California,United States","89","lyt"
                      "Hartford,Connecticut,United States","879","polo"
                      "San Diego,California,United States","45454","utyr"
                      "Seattle,Washington,United States","uytr","69"
                      "NA,NA,United States","87","tree"', header=T, sep=",")
my.data$LocationList <- gsub("NA,", "", my.data$LocationList)
my.data
#                         LocationList Identity Category
# 1    New York,New York,United States       42        S
# 2           California,United States       89      lyt
# 3 Hartford,Connecticut,United States      879     polo
# 4 San Diego,California,United States    45454     utyr
# 5   Seattle,Washington,United States     uytr       69
# 6                      United States       87     tree

如果在写入常规csv文件时删除了引号,则稍后将无法读取数据。这是因为您在LocationList变量中的每个值中都有逗号,因此您可以在字段中间使用逗号并在字段之间标记中断。您可以尝试使用write.csv2(),这将指示带有分号;的新字段。你可以使用:

write.csv2(my.data, file="myFile.csv", quote=FALSE, row.names=FALSE)

产生以下文件:

LocationList;Identity;Category
New York,New York,United States;42;S
California,United States;89;lyt
Hartford,Connecticut,United States;879;polo
San Diego,California,United States;45454;utyr
Seattle,Washington,United States;uytr;69
United States;87;tree

我现在注意到行的 Identity Category 的值 5 大概搞砸了。你可能想在写入文件之前切换它们。

x             <- my.data[5, 2]
my.data[5, 2] <- my.data[5, 3]
my.data[5, 2] <- x
rm(x)