我想使用R中的subset
函数来提取较小的小组研究时间序列数据组。
我的数据包含一个由六列组成的数据框:区(8区),性别,年龄间隔(4组),年,月和计数列。
示例:
District Gender Year Month AgeGroupNew TotalDeaths
1 Eastern Female 2003 1 0 4
2 Eastern Female 2003 1 01-4 1
3 Eastern Female 2003 1 05-14 1
4 Eastern Female 2003 1 15+ 91
5 Eastern Female 2003 2 0 4
6 Eastern Female 2003 2 01-4 1
我想为每个区域提取较小的子集,性别和年龄间隔得到这样的结果:
District Gender Year Month AgeGroupNew TotalDeaths
Northern Male 2003 1 01-4 0
Northern Male 2003 2 01-4 1
Northern Male 2003 3 01-4 0
Northern Male 2003 4 01-4 3
Northern Male 2003 5 01-4 4
Northern Male 2003 6 01-4 6
Northern Male 2003 7 01-4 5
Northern Male 2003 8 01-4 0
Northern Male 2003 9 01-4 1
Northern Male 2003 10 01-4 2
Northern Male 2003 11 01-4 0
Northern Male 2003 12 01-4 1
Northern Male 2004 1 01-4 1
Northern Male 2004 2 01-4 0
转到
Northern Male 2006 11 01-4 0
Northern Male 2006 12 01-4 0
到目前为止,我一直在尝试使用它,这要归功于DWin在previous question中指出它。
subset(datNew, subset=(District=="Eastern" & Gender=="Female" & AgeGroupNew=="01-4"))
[1] District Gender Year Month AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)
但是R继续给我上面的输出 - 它不应该。
我已经尝试了其他成功的组合,但似乎在subset
中使用“区域”会导致此<0 rows> (or 0-length row.names)
。
这有效:
> head(subset(datNew, Year=="2004" & Month=="8" & AgeGroupNew =="0"))
District Gender Year Month AgeGroupNew TotalDeaths
77 Eastern Female 2004 8 0 10
269 Eastern Male 2004 8 0 6
461 Khayelitsha Female 2004 8 0 13
653 Khayelitsha Male 2004 8 0 15
845 Klipfontein Female 2004 8 0 7
1037 Klipfontein Male 2004 8 0 6
但不是
> head(subset(datNew, District=="Eastern" & Gender=="Female" & AgeGroupNew =="0"))
[1] District Gender Year Month AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)
区域造成这种情况的原因是什么?这个子集的组合有0行是绝对错误的 - 据我所知,有足够的数据。
我已经尝试过 - 而且从其他帖子来看,这是我想要实现的目标,但仍然无效:
> head(subset(datNew,datNew[[1]] %in% District[1] & Gender=="Female" & AgeGroupNew=="0"))
District Gender Year Month AgeGroupNew TotalDeaths
1 Eastern Female 2003 1 0 4
5 Eastern Female 2003 2 0 4
9 Eastern Female 2003 3 0 5
13 Eastern Female 2003 4 0 12
17 Eastern Female 2003 5 0 7
21 Eastern Female 2003 6 0 13
有了这个,我无法从其他地区中选择,例如“南方”,“Khayelitsha”等。无论我改变datNew[[1 or 2 or 3]]
和District[[1 or 2 or 3]]
。
我真的不知道%in%
上面做了什么?
我很困惑。任何帮助组合。
答案 0 :(得分:2)
预测:给我们结果str(datNew $ District [1]),所有内容都将被揭示。我预测会出现一个非打印字符,可能是一个尾随空格(或两个)。
因此,使用str(...)的结果,正确的代码将是:
subset(datNew, District=="Eastern " & Gender=="Female" & AgeGroupNew =="0")