数据集结构是:
> str(trainData)
'data.frame': 891 obs. of 13 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 2 2 ...
$ Pclass : Factor w/ 3 levels "1st","2nd","3rd": 3 1 3 1 3 3 1 3 3 2 ...
$ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
$ Sex : Factor w/ 2 levels "Male","Female": 1 2 2 2 1 1 1 1 2 2 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : int NA NA NA 113803 373450 330877 17463 349909 347742 237736 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : chr "" "C85" "" "C123" ...
$ Embarked : chr "S" "C" "S" "S" ...
$ Area : Factor w/ 9 levels "","A","B","C",..: 1 4 1 4 1 1 6 1 1 1 ...
我想在数据框中创建一个新列,以存储Name变量中包含的地址形式。为此,我需要提取字符串“Mr”,“Mrs”等等,并将它们存储在一个新的向量中。我想以下列方式解决问题。
vec <- vector()
for (i in 1 : nrow(trainData)) {
if (grep("Mr\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Mr"}
else if (grep("Miss\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Miss"}
else if (grep("Mrs\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Mrs"}
else if (grep("Don\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Don"}
else if (grep("Master\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Master"}
else {vec[i] <- "Boh"}
}
..然后使用cbind
函数将现有数据框与新列绑定
FormOfAddress
。我没有测试接下来的两行代码,因为我收到了前一个块的错误消息。
trainData <- as.data.frame(cbind(trainData, vec))
names(trainData)[length(trainData)] <- "FormOfAddress"
基本上我在这一点上卡住了..
> vec <- vector()
> for (i in 1 : nrow(trainData)) {
+ if (grep("Mr\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Mr"}
+ else if (grep("Miss\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Miss"}
+ else if (grep("Mrs\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Mrs"}
+ else if (grep("Don\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Don"}
+ else if (grep("Master\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Master"}
+ else {vec[i] <- "Boh"; next}
+ }
Error in if (grep("Mr\\.", trainData[i, c("Name")]) == 1) { :
argument is of length zero
if语句的第一部分对我来说是正确的。当字符串Mr.
包含在名称中时,它将返回TRUE
。另外第二部分看起来很好(至少在第一个循环上)并在向量Mr
上写出字符串vec
。
问题在于我认为的第二个循环,但我找不到让它工作的方法。
答案 0 :(得分:0)
trainData$Name
## [1] "Braund, Mr. Owen Harris"
## [2] "Cumings, Mrs. John Bradley (Florence Briggs Thayer)"
## [3] "Heikkinen, Miss. Laina"
## [4] "Futrelle, Mrs. Jacques Heath (Lily May Peel)"
## [5] "tt"
## [6] "Mr. Jones"
for (x in trainData$Name) {
print(grep("Mr\\.", x))
print(grepl("Mr\\.", x));
}
## [1] 1
## [1] TRUE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## [1] 1
## [1] TRUE
## Doing it without a loop.
## You might have to come up with a different
## regex here depending on the rest of your data
vec <- gsub("^([^,]+, )?([^.]+).*", "\\2", trainData$Name)
## [1] "Mr" "Mrs" "Miss" "Mrs" "tt" "Mr"
vec <- ifelse(vec == trainData$Name, "Boh", vec)
## [1] "Mr" "Mrs" "Miss" "Mrs" "Boh" "Mr"