Question

我想使用R中的subset函数来提取较小的小组研究时间序列数据组。

我的数据包含一个由六列组成的数据框：区（8区），性别，年龄间隔（4组），年，月和计数列。

示例：

  District Gender Year Month AgeGroupNew TotalDeaths
1 Eastern  Female 2003     1           0           4
2 Eastern  Female 2003     1        01-4           1
3 Eastern  Female 2003     1       05-14           1
4 Eastern  Female 2003     1         15+          91
5 Eastern  Female 2003     2           0           4
6 Eastern  Female 2003     2        01-4           1

我想为每个区域提取较小的子集，性别和年龄间隔得到这样的结果：

     District  Gender Year Month AgeGroupNew TotalDeaths
     Northern    Male 2003     1        01-4           0
     Northern    Male 2003     2        01-4           1
     Northern    Male 2003     3        01-4           0
     Northern    Male 2003     4        01-4           3
     Northern    Male 2003     5        01-4           4
     Northern    Male 2003     6        01-4           6
     Northern    Male 2003     7        01-4           5
     Northern    Male 2003     8        01-4           0
     Northern    Male 2003     9        01-4           1
     Northern    Male 2003    10        01-4           2
     Northern    Male 2003    11        01-4           0
     Northern    Male 2003    12        01-4           1
     Northern    Male 2004     1        01-4           1
     Northern    Male 2004     2        01-4           0

转到

     Northern    Male 2006    11        01-4           0
     Northern    Male 2006    12        01-4           0

到目前为止，我一直在尝试使用它，这要归功于DWin在previous question中指出它。

subset(datNew, subset=(District=="Eastern" &  Gender=="Female" &  AgeGroupNew=="01-4"))
[1] District    Gender      Year        Month       AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)

但是R继续给我上面的输出 - 它不应该。

我已经尝试了其他成功的组合，但似乎在subset中使用“区域”会导致此<0 rows> (or 0-length row.names)。

这有效：

> head(subset(datNew, Year=="2004" & Month=="8" & AgeGroupNew =="0"))
         District Gender Year Month AgeGroupNew TotalDeaths
77       Eastern  Female 2004     8           0          10
269      Eastern    Male 2004     8           0           6
461  Khayelitsha  Female 2004     8           0          13
653  Khayelitsha    Male 2004     8           0          15
845  Klipfontein  Female 2004     8           0           7
1037 Klipfontein    Male 2004     8           0           6

但不是

> head(subset(datNew, District=="Eastern" & Gender=="Female" & AgeGroupNew =="0"))
[1] District    Gender      Year        Month       AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)

区域造成这种情况的原因是什么？这个子集的组合有0行是绝对错误的 - 据我所知，有足够的数据。

我已经尝试过 - 而且从其他帖子来看，这是我想要实现的目标，但仍然无效：

> head(subset(datNew,datNew[[1]] %in% District[1] & Gender=="Female" & AgeGroupNew=="0"))
   District Gender Year Month AgeGroupNew TotalDeaths
1  Eastern  Female 2003     1           0           4
5  Eastern  Female 2003     2           0           4
9  Eastern  Female 2003     3           0           5
13 Eastern  Female 2003     4           0          12
17 Eastern  Female 2003     5           0           7
21 Eastern  Female 2003     6           0          13

有了这个，我无法从其他地区中选择，例如“南方”，“Khayelitsha”等。无论我改变datNew[[1 or 2 or 3]]和District[[1 or 2 or 3]]。我真的不知道%in%上面做了什么？

我很困惑。任何帮助组合。

Answer 1

预测：给我们结果str（datNew $ District [1]），所有内容都将被揭示。我预测会出现一个非打印字符，可能是一个尾随空格（或两个）。

因此，使用str（...）的结果，正确的代码将是：

subset(datNew, District=="Eastern " & Gender=="Female" & AgeGroupNew =="0")

在R中使用'subset'函数时帮助解决持久性问题

1 个答案: