我是R的新手,并没有找到解决我的具体问题的方法。我真的希望你们能帮助我。
我有以下数据框:
hid <- c('1','2','2','2','2','4','4','4','4','4','4')
syear <- c(2000,2001,2003,2003,2003,2000,2000,2001,2001,2002,2002)
employlvl <- c('Full-time','Part-time','Part-time','Unemployed','Unemployed','Full-time','Full-time','Full-time','Unemployed','Part-time', 'Full-time')
relHead <- c('Head','Head','Head','Partner','Child','Head','Partner','Head','Partner','Head','Partner')
df <- data.frame(hid,syear,employlvl,relHead)
| hid | syear | Employment | Relation to Head of HH|
|-----|-------|-------------|-----------------------|
| 1 | 2000 | Full-time | Head |
| 2 | 2001 | Part-time | Head |
| 2 | 2003 | Part-time | Head |
| 2 | 2003 | Unemployed | Partner |
| 2 | 2003 | Unemployed | Child |
| 4 | 2000 | Full-time | Head |
| 4 | 2000 | Full-time | Partner |
| 4 | 2001 | Full-time | Head |
| 4 | 2001 | Unemployed | Partner |
| 4 | 2002 | Part-time | Head |
| 4 | 2002 | Full-time | Partner |
如果hid(家庭识别号码)和syear(调查年份)中的值相等,我想创建一个具有合作伙伴就业水平的新列。
我希望得到以下输出:
| hid | syear | Employment | Relation to Head of HH| Employment Partner|
|-----|-------|-------------|-----------------------|-------------------|
| 1 | 2000 | Part-time | Head | NA |
| 2 | 2001 | Part-time | Head | NA |
| 2 | 2003 | Part-time | Head | Unemployed |
| 2 | 2003 | Unemployed | Partner | NA |
| 2 | 2003 | Unemployed | Child | NA |
| 4 | 2000 | Full-time | Head | Full-time |
| 4 | 2000 | Full-time | Partner | NA |
| 4 | 2001 | Full-time | Head | Unemployed |
| 4 | 2001 | Unemployed | Partner | NA |
| 4 | 2002 | Part-time | Head | Full-time |
| 4 | 2002 | Full-time | Partner | NA |
提前非常感谢你!
答案 0 :(得分:1)
我们可以使用dplyr
和tidyr
来实现这一目标。有两个步骤。
第1步:找出哪些hid
和syear
组合有两个以上的记录。过滤它们并使用Child
过滤掉记录。使用spread
查找Head
和Partner
关系,创建新的数据框。使用Head
创建一个用于合并的新列。 dt2
是此步骤的输出。
第2步:使用left_join
将dt2
与原始数据框dt
合并。 dt3
是最终输出。
library(dplyr)
library(tidyr)
dt2 <- dt %>%
group_by(hid, syear) %>%
filter(n() > 1) %>%
filter(`Relation to Head of HH` != "Child") %>%
spread(`Relation to Head of HH`, Employment) %>%
mutate(Relation = "Head") %>%
rename(`Employment Partner` = Partner) %>%
select(-Head)
dt3 <- dt %>%
left_join(dt2, by = c("hid", "syear", "Relation to Head of HH" = "Relation"))
数据:
library(dplyr)
dt <- data_frame(hid = c(1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4),
syear = c(2000, 2001, 2003, 2003, 2003, 2000, 2000, 2001, 2001, 2002, 2002),
Employment = c("Full-time", "Part-time", "Part-time", "Unemployed", "Unemployed",
"Full-time", "Full-time", "Full-time", "Unemployed", "Part-time",
"Full-time"),
"Relation to Head of HH" = c("Head", "Head", "Head", "Partner", "Child", "Head",
"Partner", "Head", "Partner", "Head", "Partner"))