无法将NA放入数据集

时间:2017-03-16 17:25:15

标签: r kaggle

我正在研究泰坦尼克号dataset。属性Cabin在大多数行中都是空的。因此,我想将Cabin列的空值替换为NA

为此,我写道:

train[train$Cabin==" "] <- "NA"

write.csv(train,file="editedtrain.csv")

但是文件editedtrain.csvNA列的值为空的行中没有Cabin

以下是运行上述代码后head(train)的结果。

          Ticket          Fare     Cabin   Embarked
1        A/5 21171          7.2500              S
2         PC 17599          71.2833   C85        C
3       STON/O2.3101282     7.9250              S
4           113803          53.1000  C123        S
5           373450          8.0500              S
6           330877           8.4583              Q

dput

structure(
  list(
    PassengerId = 1:6,
    Survived = c(0L, 1L, 1L, 1L,0L, 0L),
    Pclass = c(3L, 1L, 3L, 1L, 3L, 3L),
    Name = c("Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)", "Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)", "Allen, Mr. William Henry", "Moran, Mr. James"),
    Sex = c("male", "female", "female", "female", "male", "male"),
    Age = c(22, 38, 26, 35, 35, NA),
    SibSp = c(1L, 1L, 0L, 1L, 0L, 0L),
    Parch = c(0L, 0L, 0L, 0L, 0L, 0L),
    Ticket = c("A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "373450", "330877"),
    Fare = c(7.25, 71.2833, 7.925, 53.1, 8.05, 8.4583),
    Cabin = c("", "C85", "", "C123", "", ""),
    Embarked = c("S", "C", "S", "S", "S", "Q")),
  .Names = c("PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"),
  row.names = c(NA, 6L), class = "data.frame")

我如何实现我的目标?

1 个答案:

答案 0 :(得分:1)

正如您在dput中看到的,train$Cabin缺失值为""

因此,为了将其更改为NA,您无法在引号内添加空格。

您只需要执行此操作train$Cabin[train$Cabin==""] <- NA

您需要指定要更改Cabin列,并且r识别NA不带引号。

正如Frank所述,如果您只是使用.csv阅读na.strings = ""文件,它将自动执行此任务。它会是这样的:

train <- read.csv("YOUR_PATH\\train.csv", stringAsFactors = F, na.strings = "")

一些提示:

  • 当您read.csv()时,请设置stringsAsFactors = F,如果您希望字符列继续作为字符,而不是因素

  • 当您write.csv()时,如果您不希望创建一个包含行ID的列,请设置row.names = F