如果行的两个元素相等,则使用另一列的信息创建新列

时间:2017-08-18 12:05:06

标签: r multiple-columns

我是R的新手,并没有找到解决我的具体问题的方法。我真的希望你们能帮助我。

我有以下数据框:

hid <- c('1','2','2','2','2','4','4','4','4','4','4')
syear <- c(2000,2001,2003,2003,2003,2000,2000,2001,2001,2002,2002)
employlvl <- c('Full-time','Part-time','Part-time','Unemployed','Unemployed','Full-time','Full-time','Full-time','Unemployed','Part-time', 'Full-time')
relHead <- c('Head','Head','Head','Partner','Child','Head','Partner','Head','Partner','Head','Partner')

df <- data.frame(hid,syear,employlvl,relHead)



| hid | syear |  Employment | Relation to Head of HH|
|-----|-------|-------------|-----------------------|
|  1  | 2000  |  Full-time  |         Head          |
|  2  | 2001  |  Part-time  |         Head          |
|  2  | 2003  |  Part-time  |         Head          |
|  2  | 2003  |  Unemployed |        Partner        |
|  2  | 2003  |  Unemployed |         Child         |
|  4  | 2000  |  Full-time  |         Head          |
|  4  | 2000  |  Full-time  |        Partner        |
|  4  | 2001  |  Full-time  |         Head          |
|  4  | 2001  |  Unemployed |        Partner        |
|  4  | 2002  |  Part-time  |         Head          |
|  4  | 2002  |  Full-time  |        Partner        |

如果hid(家庭识别号码)和syear(调查年份)中的值相等,我想创建一个具有合作伙伴就业水平的新列。

我希望得到以下输出:

| hid | syear |  Employment | Relation to Head of HH| Employment Partner|
|-----|-------|-------------|-----------------------|-------------------|
|  1  | 2000  |  Part-time  |         Head          |        NA         |
|  2  | 2001  |  Part-time  |         Head          |        NA         |
|  2  | 2003  |  Part-time  |         Head          |    Unemployed     |
|  2  | 2003  |  Unemployed |       Partner         |        NA         |
|  2  | 2003  |  Unemployed |         Child         |        NA         |
|  4  | 2000  |  Full-time  |         Head          |     Full-time     |
|  4  | 2000  |  Full-time  |        Partner        |        NA         |
|  4  | 2001  |  Full-time  |         Head          |    Unemployed     |
|  4  | 2001  |  Unemployed |        Partner        |        NA         |
|  4  | 2002  |  Part-time  |         Head          |     Full-time     |
|  4  | 2002  |  Full-time  |        Partner        |        NA         |

提前非常感谢你!

1 个答案:

答案 0 :(得分:1)

我们可以使用dplyrtidyr来实现这一目标。有两个步骤。

第1步:找出哪些hidsyear组合有两个以上的记录。过滤它们并使用Child过滤掉记录。使用spread查找HeadPartner关系,创建新的数据框。使用Head创建一个用于合并的新列。 dt2是此步骤的输出。

第2步:使用left_joindt2与原始数据框dt合并。 dt3是最终输出。

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  group_by(hid, syear) %>%
  filter(n() > 1) %>%
  filter(`Relation to Head of HH` != "Child") %>%
  spread(`Relation to Head of HH`, Employment) %>%
  mutate(Relation = "Head") %>%
  rename(`Employment Partner` = Partner) %>%
  select(-Head)

dt3 <- dt %>%
  left_join(dt2, by = c("hid", "syear", "Relation to Head of HH" = "Relation"))

数据:

library(dplyr)
dt <- data_frame(hid = c(1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4),
                 syear = c(2000, 2001, 2003, 2003, 2003, 2000, 2000, 2001, 2001, 2002, 2002),
                 Employment = c("Full-time", "Part-time", "Part-time", "Unemployed", "Unemployed",
                                "Full-time", "Full-time", "Full-time", "Unemployed", "Part-time", 
                                "Full-time"),
                 "Relation to Head of HH" = c("Head", "Head", "Head", "Partner", "Child", "Head", 
                                              "Partner", "Head", "Partner", "Head", "Partner"))