Working with multiple rows with same Id (key) column value in R

时间:2015-09-01 21:47:33

标签: r dataframe data.table dplyr

I'm working with census data. What the Data-set looks like is:

Household-Id    Member-Type    Education    Birth
1               Father         12           1955
1               Mother         16           1963
1               Child          16           1986
1               Child          12           1995
2               Father         12           1950
2               Mother         9            1955
2               Child          18           1982
2               Child          14           1985
2               Child          16           1975
3               Father         16           1962
3               Mother         14           1965
3               Child          16           1990

What I want it to look like is:

Household-Id    Member-Type    Education    Birth    Mother-Education    Birth-Order 
1               Father         12           1955     
1               Mother         16           1963
1               Child          16           1986     16                  1
1               Child          12           1995     16                  2
2               Father         12           1950
2               Mother         9            1955
2               Child          18           1982     9                   1
2               Child          14           1985     9                   2
2               Child          16           1975     9                   3
3               Father         16           1962
3               Mother         14           1965
3               Child          16           1990     14                  1

As far as I know, R doesn't support loop operation as in languages like Java or C, And I don't really have any idea on how to do this!

2 个答案:

答案 0 :(得分:4)

Where did you hear that R doesn't support loops?? It most certainly does - this particular case sounds best suited for data.table (where loops are used internally)

install.packages("data.table")
library(data.table)
dat = as.data.table(YourDataFrame)

dat[Member.Type == "Child", Birth_Order:=rank(Birth) ,by=Household.Id]
dat[, MotherEducation := Education[Member.Type=="Mother"] , by=Household.Id]
dat[Member.Type != "Child", MotherEducation := NA]
dat
#   Household.Id Member.Type Education Birth MotherEducation Birth_Order
#  1:            1      Father        12  1955              NA          NA
#  2:            1      Mother        16  1963              NA          NA
#  3:            1       Child        16  1986              16           1
#  4:            1       Child        12  1995              16           2
#  5:            2      Father        12  1950              NA          NA
#  6:            2      Mother         9  1955              NA          NA
#  7:            2       Child        18  1982               9           2
#  8:            2       Child        14  1985               9           3
#  9:            2       Child        16  1975               9           1
# 10:            3      Father        16  1962              NA          NA
# 11:            3      Mother        14  1965              NA          NA
# 12:            3       Child        16  1990              14           1

答案 1 :(得分:1)

Here's a dplyr approach:

library(dplyr)

dat = dat %>% group_by(Household.Id, Member.Type) %>% 
  arrange(Birth) %>%
  mutate(Birth_Order = 1:n(),
         Birth_Order = ifelse(Member.Type=="Child", Birth_Order, NA_integer_)) %>%
  group_by(Household.Id) %>%
  mutate(Mother_Education = ifelse(Member.Type=="Child", 
                                   Education[Member.Type=="Mother"], NA))

   Household.Id Member.Type Education Birth Birth_Order Mother_Education
1             1       Child        16  1986           1               16
2             1       Child        12  1995           2               16
3             1      Father        12  1955          NA               NA
4             1      Mother        16  1963          NA               NA
5             2       Child        16  1975           1                9
6             2       Child        18  1982           2                9
7             2       Child        14  1985           3                9
8             2      Father        12  1950          NA               NA
9             2      Mother         9  1955          NA               NA
10            3       Child        16  1990           1               14
11            3      Father        16  1962          NA               NA
12            3      Mother        14  1965          NA               NA