rstudio 3.4.0 32位(64位操作系统)windows 10
分析并运行kaggle内核以获得泰坦尼克号,没有错误,也没有结果。
str(full)
'data.frame': 1309 obs. of 13 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley
(Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
$ Sex : chr "male" "female" "female" "female" ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : chr "" "C85" "" "C123" ...
$ Embarked : chr "S" "C" "S" "S" ...
$ Title : chr " Mr" " Mrs" " Miss" " Mrs" ...
从乘客姓名中获取标题:
full$Title <- gsub('(.*,)|(\\..*)','',full$Name)
# Show title counts by sex
table(full$Sex, full$Title)
# Titles with very low cell counts to be combined to "rare" level
rare_title <- c ('Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don',
'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer')
# Also reassign mlle, ms, and mme accordingly
full$Title[full$Title == 'Mlle'] <- 'Miss'
full$Title[full$Title == 'Ms'] <- 'Miss'
full$Title[full$Title == 'Mme'] <- 'Mrs'
full$Title[full$Title %in% rare_title] <- 'Rare Title'
# Show title counts by sex again
table(full$Sex, full$Title)
Capt Col Don Dona Dr Jonkheer Lady Major Master Miss Mlle
female 0 0 0 1 1 0 1 0 0 260 2
male 1 4 1 0 7 1 0 2 61 0 0
Mme Mr Mrs Ms Rev Sir the Countess
female 1 0 197 2 0 0 1
male 0 757 0 0 8 1 0
我无法理解为什么值没有被分组到罕见级别,尽管我没有错误。那么为什么会这样呢?
答案 0 :(得分:1)
问题是你的标题前面有白色空格。正如您在str(full)
中看到的那样,标题与" Mr"
类似,而不是"Mr"
。
您可以使用trimws
:
full <- data.frame(Title=c(" Mr", " Mrs", " Miss", " Major"," Don"),
age=1:5,stringsAsFactors = FALSE)
rare_title <- c ('Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don'
,'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer')
full$Title[trimws(full$Title) %in% rare_title] <- 'Rare Title'
[1] " Mr" " Mrs" " Miss" "Rare Title" "Rare Title"