我有一个数据框“测试”我想要分组,但是当我尝试时,我失去了所有观察。为什么会这样?
> str(Test)
'data.frame': 157025 obs. of 13 variables:
$ Cancellations : int 1 1 1 1 1 1 1 1 1 1 ...
$ Benefit : chr "Single Parent Support " "Single Parent Support " "Job Seeker " "Job Seeker " ...
$ Region : chr " Northland " " Northland " " Northland " " Northland " ...
$ Month : chr "Jun 14" "Jun 14" "Jun 14" "Jun 14" ...
$ CanReason : chr "Change in Marital Status " "Change in Marital Status " "Change in Marital Status " "Change in Marital Status " ...
$ Age : chr " 20-24 " " 20-24 " " 20-24 " " 20-24 " ...
$ Ethnicity : chr "NZ European/Pakeha" "Maori " "Other " "NZ European/Pakeha" ...
$ SMS : chr "General Case Management " "Work Focused Case Management " "Work Focused Case Management " "Work Search Support " ...
$ Duration : chr "2-4 yrs " "2-4 yrs " "6-9 mth " "0-3 mth " ...
$ SMSDuration : int 361 348 59 69 150 37 63 294 107 107 ...
$ AgeYoungest : chr "0-4 yrs " "0-4 yrs " "No Children" "No Children" ...
$ AgeYoungestNonSub: chr "0-4 yrs" "0-4 yrs" "No Children" "No Children" ...
$ Liability : chr " 166,000 " " 166,000 " " 102,000 " " 102,000 " ...
> subDie <- Test[CanReason == "Died",]
> str(subDie)
'data.frame': 0 obs. of 13 variables:
$ Cancellations : int
$ Benefit : chr
$ Region : chr
$ Month : chr
$ CanReason : chr
$ Age : chr
$ Ethnicity : chr
$ SMS : chr
$ Duration : chr
$ SMSDuration : int
$ AgeYoungest : chr
$ AgeYoungestNonSub: chr
$ Liability : chr
我尝试将因子变量转换为字符。当我把逗号放在“CanReason”索引行前面时(subDie&lt; - Test [,CanReason ==“Died”])R告诉我,我对0变量有157025次观察.... 我难倒了
答案 0 :(得分:1)
使用正则表达式在字符向量"Died"
中搜索字符串CanReason
,使用grepl()
返回指示匹配的逻辑向量。使用它来分组Test
。
例如
set.seed(12)
CanReason <- sample(c("Change in Marital status",
"Change in Marital status ",
" Died ",
"Died ",
"Died"), 10000, replace = TRUE)
ind <- grepl("Died", CanReason)
sum(ind)
length(CanReason[ind])
,并提供:
> sum(ind)
[1] 6037
> length(CanReason[ind])
[1] 6037
> head(CanReason[ind])
[1] "Died" "Died" "Died "
[4] "Died" " Died " " Died "