Question

我正在尝试按照以下要求对数据集进行分组：

ethnicity是xyz
education是学士及以上学历，即Bachelor's Degree或Graduate Degree
然后我想看看符合上述要求的人的收入等级。括号类似于$30,000 - $39,999或$100,000 - $124,999。
最后，作为我的最终输出，我希望看到从第三个项目（上面）获得的子集，其中包含这些个体是否具有宗教信仰的列。在数据集中，该对应于religious和not religious。

所以它看起来像这样

   income               religious
$30,000 - $39,999      not religious
$50,000 - $59,999         religious
  ....                    ....
  ....                    ....

记住列出的内容满足要求1和2。

请记住，我是编程新手。我试图在很长一段时间内解决这个问题，并在很多帖子中挖掘过。我似乎无法得到任何工作。我该如何解决？有人请帮忙。

为了不删除帖子的清晰度，我会发布我在下面尝试的内容（但可以随意忽略它，因为它可能是垃圾）。

为了进入第3步，我尝试了以下的许多变化，但是惨遭失败，并且即将用键盘猛击我的头：

df$income[which(df$ethnicity == "xyz" & df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

我也试过了：

race <- df$ethnicity == "xyz"
ba_ma_phd <- df$education %in% c("Graduate Degree", "Bachelor's Degree")
income_sub <- df$income[ba_ma_phd & race]

我相信 income_sub让我进入第3步，但我不知道如何将其推向第4步。

Answer 1

library(dplyr)

df %>%
  filter(ethnicity == "xyz" & 
         education %in% c("Bachelor's Degree", "Graduate Degree")) %>%
  group_by(religious) %>%
  summarize(lower_bound = min(income),
            upper_bound = max(income) )

Answer 2

更改我的评论，因为它太长了。

首先是你的代码，你几乎就在那里;由于收入是向量而不是数据框，因此您不需要使用尾随逗号。即你可以使用

df$income[which(df$ethnicity == "xyz" & 
         df$education %in% c("Bachelor's Degree", "Graduate Degree") ] 
 # note no comma after the closing bracket

如果要创建子集化数据，请不要在开头包含df$income，只需使用df并保留逗号。这将对您的数据进行分组，但保留所有列

sub_df <- df[which(df$ethnicity == "xyz" &
       df$education %in% c("Bachelor's Degree", "Graduate Degree"), ]

要查看子集数据的income级别，您可以使用table

table(sub_df$income)

您可以再次使用table按income状态检查每个religious的观察计数。

table(sub_df$income, sub_df$religious)

如果您只想选择income和religious列，也可以使用[

执行此操作

sub_df[c("religious", "income")]

难以在R中获得子集

2 个答案: