我正在研究泰坦尼克号dataset。属性Cabin
在大多数行中都是空的。因此,我想将Cabin
列的空值替换为NA
。
为此,我写道:
train[train$Cabin==" "] <- "NA"
write.csv(train,file="editedtrain.csv")
但是文件editedtrain.csv
在NA
列的值为空的行中没有Cabin
。
以下是运行上述代码后head(train)
的结果。
Ticket Fare Cabin Embarked
1 A/5 21171 7.2500 S
2 PC 17599 71.2833 C85 C
3 STON/O2.3101282 7.9250 S
4 113803 53.1000 C123 S
5 373450 8.0500 S
6 330877 8.4583 Q
dput
:
structure(
list(
PassengerId = 1:6,
Survived = c(0L, 1L, 1L, 1L,0L, 0L),
Pclass = c(3L, 1L, 3L, 1L, 3L, 3L),
Name = c("Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)", "Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)", "Allen, Mr. William Henry", "Moran, Mr. James"),
Sex = c("male", "female", "female", "female", "male", "male"),
Age = c(22, 38, 26, 35, 35, NA),
SibSp = c(1L, 1L, 0L, 1L, 0L, 0L),
Parch = c(0L, 0L, 0L, 0L, 0L, 0L),
Ticket = c("A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "373450", "330877"),
Fare = c(7.25, 71.2833, 7.925, 53.1, 8.05, 8.4583),
Cabin = c("", "C85", "", "C123", "", ""),
Embarked = c("S", "C", "S", "S", "S", "Q")),
.Names = c("PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"),
row.names = c(NA, 6L), class = "data.frame")
我如何实现我的目标?
答案 0 :(得分:1)
正如您在dput
中看到的,train$Cabin
缺失值为""
。
因此,为了将其更改为NA
,您无法在引号内添加空格。
您只需要执行此操作train$Cabin[train$Cabin==""] <- NA
您需要指定要更改Cabin
列,并且r识别NA
不带引号。
正如Frank所述,如果您只是使用.csv
阅读na.strings = ""
文件,它将自动执行此任务。它会是这样的:
train <- read.csv("YOUR_PATH\\train.csv", stringAsFactors = F, na.strings = "")
一些提示:
当您read.csv()
时,请设置stringsAsFactors = F
,如果您希望字符列继续作为字符,而不是因素
当您write.csv()
时,如果您不希望创建一个包含行ID的列,请设置row.names = F
。