子集化数据丢失了我的所有观察结果

时间:2015-04-03 02:26:34

标签: r subset

我有一个数据框“测试”我想要分组,但是当我尝试时,我失去了所有观察。为什么会这样?

> str(Test)
'data.frame':   157025 obs. of  13 variables:
$ Cancellations    : int  1 1 1 1 1 1 1 1 1 1 ...
$ Benefit          : chr  "Single Parent Support                          "               "Single Parent Support                          " "Job Seeker                                         " "Job Seeker                                     " ...
$ Region           : chr  "        Northland    " "        Northland    " "            Northland    " "        Northland    " ...
$ Month            : chr  "Jun 14" "Jun 14" "Jun 14" "Jun 14" ...
$ CanReason        : chr  "Change in Marital Status           " "Change in     Marital Status           " "Change in Marital Status           " "Change in     Marital Status           " ...
$ Age              : chr  " 20-24 " " 20-24 " " 20-24 " " 20-24 " ...
$ Ethnicity        : chr  "NZ European/Pakeha" "Maori             " "Other                      " "NZ European/Pakeha" ...
$ SMS              : chr  "General Case Management               " "Work     Focused Case Management          " "Work Focused Case Management          " "Work     Search Support                   " ...
$ Duration         : chr  "2-4 yrs " "2-4 yrs " "6-9 mth " "0-3 mth " ...
$ SMSDuration      : int  361 348 59 69 150 37 63 294 107 107 ...
$ AgeYoungest      : chr  "0-4 yrs    " "0-4 yrs    " "No Children" "No    Children" ...
$ AgeYoungestNonSub: chr  "0-4 yrs" "0-4 yrs" "No Children" "No Children" ...
$ Liability        : chr  " 166,000 " " 166,000 " " 102,000 " " 102,000 " ...


> subDie <- Test[CanReason == "Died",]

> str(subDie)
'data.frame':   0 obs. of  13 variables:
$ Cancellations    : int 
$ Benefit          : chr 
$ Region           : chr 
$ Month            : chr 
$ CanReason        : chr 
$ Age              : chr 
$ Ethnicity        : chr 
$ SMS              : chr 
$ Duration         : chr 
$ SMSDuration      : int 
$ AgeYoungest      : chr 
$ AgeYoungestNonSub: chr 
$ Liability        : chr 

我尝试将因子变量转换为字符。当我把逗号放在“CanReason”索引行前面时(subDie&lt; - Test [,CanReason ==“Died”])R告诉我,我对0变量有157025次观察.... 我难倒了

1 个答案:

答案 0 :(得分:1)

使用正则表达式在字符向量"Died"中搜索字符串CanReason,使用grepl()返回指示匹配的逻辑向量。使用它来分组Test

例如

set.seed(12)
CanReason <- sample(c("Change in      Marital status",
                      "Change in   Marital status ",
                      " Died    ",
                      "Died                ",
                      "Died"), 10000, replace = TRUE)
ind <- grepl("Died", CanReason)

sum(ind)
length(CanReason[ind])

,并提供:

> sum(ind)
[1] 6037
> length(CanReason[ind])
[1] 6037
> head(CanReason[ind])
[1] "Died"                 "Died"                 "Died                "
[4] "Died"                 " Died    "            " Died    "