我有一些关于不同家庭的人口普查数据,就像这样(显然真正的数据集要大得多,还有很多其他变量):
df <- data.frame("HouseholdID" = c(1, 1, 1, 2, 2, 3, 3, 3),
"Age" = c(45, 38, 6, 78, 64, 56, 58, 12))
我有兴趣知道每个成年人是否有未满18岁的孩子,所以我认为最简单的方法可能是在数据框中添加一列:
df$kid_under_18 <- "No"
然后将值更改为&#34;是&#34;对于符合我标准的行。麻烦的是我在编写R代码时遇到了问题:
&#34;对于每个HouseholdID,如果有任何Age&lt; 18&#34; &lt; - &#34;是&#34;
我想我应该可以使用&#34;&#34; (即通过HouseholdID查看)和&#34;如果有的话#34;声明,但我无法弄清楚如何改变我的&#34; kid_under_18&#34;基于此的列。我想我已经接近了,但语法还没有:
by(df$Age, df$HouseholdID, function(x) if(any(x < 18)) {df$kid_under_18 <- "Yes"})
将评估该语句,但不会在数据框中添加任何内容。
df$kid_under_18 <- by(df$Age, df$HouseholdID, function(x) if(any(x < 18)) print ("Yes"))
给我一个错误
$<-.data.frame
(*tmp*
,&#34; kid_under_18&#34;,value = list(1
=&#34;是&#34;,: 替换有3行,数据有8个
答案 0 :(得分:1)
使用库@Named("ProjectBacking")
@SessionScoped
public class ProjectsBacking implements Serializable {
private static final long serialVersionUID = 1L;
//Field Properties
private String projectName;
private ProjectContent currentContent;
private String editorContent;
private boolean isEdit;
//Data Properties
private User currentUser;
private Project project;
private List<ProjectDocument> projectDocuments;
private List<ProjectContent> projectContents;
private String testContent;
@Inject
private LoginBacking login;
@EJB
private TeamDAO teamDAO;
@EJB
private ProjectDAO projectDAO;
@EJB
private DocumentDAO documentDAO;
@EJB
private ChapterDAO chapterDAO;
@EJB
private ProjectDocumentDAO projectDocumentDAO;
@EJB
private ProjectChapterDAO projectChapterDAO;
@EJB
private ProjectContentDAO projectContentDAO;
public void onPageLoad() {
currentUser = login.getUser();
projectDocuments = projectDocumentDAO.getAllProjectDocumentsOrderedByOrderNumber();
/*
When Page first time loaded, set the content on the page
to the content of the first chapter of the first document
(which currently is "Lastenheft")
*/
if(currentUser.getTeam().getProject()!= null && currentContent == null) {
currentContent = currentUser.getTeam().getProject().getProjectDocuments().get(0).getProjectChapters().get(0).getProjectContent();
}
}
public void setIsEditTrue() {
isEdit = true;
}
public String createProject() {
project = new Project();
project.setProjectName(projectName);
project.setTeam(currentUser.getTeam());
project = this.createInitialProjectContents(project);
project = projectDAO.createProject(project);
currentUser.setTeam(teamDAO.updateTeam(currentUser.getTeam()));
return "projects?faces-redirect=true";
}
public void setCurrentContentForChapter(ProjectChapter chapter) {
currentContent = projectContentDAO.getProjectContentForProjectChapterId(chapter.getId());
editorContent = currentContent.getContent();
isEdit = false;
}
public void updateProjectContent() {
editorContent = editorContent.replaceAll("\\r|\\n", "");
currentContent.setContent(editorContent);
currentContent = projectContentDAO.updateProjectContent(currentContent);
isEdit = false;
}
}
,您可以执行以下操作:
dplyr
输出如下:
library(dplyr)
df %>%
group_by(HouseholdID) %>%
mutate(under_18 = any(Age < 18))
如果您想要每Source: local data frame [8 x 3]
Groups: HouseholdID [3]
HouseholdID Age under_18
<dbl> <dbl> <lgl>
1 1 45 TRUE
2 1 38 TRUE
3 1 6 TRUE
4 2 78 FALSE
5 2 64 FALSE
6 3 56 TRUE
7 3 58 TRUE
8 3 12 TRUE
行一行,则可以使用summarise
代替mutate
。您还可以使用mutate中的HouseholdID
赋值将逻辑值转换为其他值,例如:
ifelse
答案 1 :(得分:1)
使用data.table
library(data.table)
setDT(df)[, kid_under_18 := any(Age < 18) , HouseholdID]
或者如果我们需要&#39;是&#39;或者&#39;否&#39;塔格
setDT(df)[, kid_under_18 := c("Yes", "No")[any(Age < 18) + 1] , HouseholdID]
答案 2 :(得分:0)
您可能正在寻找摘要吗?
library(plyr);
ddply(df, "HouseholdID", summarise, hasChildUnder18 = any(Age < 18))
HouseholdID hasChildUnder18
1 1 TRUE
2 2 FALSE
3 3 TRUE
我们可以将TRUE
和FALSE
重新编码为yes
和no
:
library(plyr); library(car);
ddply(df, "HouseholdID", summarise, hasChildUnder18 = recode(any(Age < 18),
"TRUE='yes'; FALSE='no'"))
HouseholdID hasChildUnder18
1 1 yes
2 2 no
3 3 yes
答案 3 :(得分:0)
您只需使用ifelse()
并使用cbind.data.frame()
和table()
功能查看频率
> df$kid_under_18 <- ifelse(df$Age < 18,"Yes","No")
> df
# HouseholdID Age kid_under_18
# 1 1 45 No
# 2 1 38 No
# 3 1 6 Yes
# 4 2 78 No
# 5 2 64 No
# 6 3 56 No
# 7 3 58 No
# 8 3 12 Yes
> table(cbind.data.frame(df$HouseholdID,df$kid_under_18))
# df$kid_under_18
# df$HouseholdID No Yes
# 1 2 1
# 2 2 0
# 3 2 1