获得一个人居住的第一个位置

时间:2018-02-27 14:54:40

标签: r dplyr data.table tidyr

我想基于年份列创建一个人住的第一个列。我有以下格式的数据

year<- c(2008, 2009, 2010, 2009, 2010, 2011)
person<- c('John', 'John', 'John', 'Brian', 'Brian','Vickey')
location<- c('London','Paris', 'Newyork','Paris','Paris','Miami')
df<- data.frame(year, person, location)

我想创建一个名为first place的列,其值为0,1。如果它是第一个城市,则为1。

有什么建议吗?

4 个答案:

答案 0 :(得分:3)

使用data.table即可:

library("data.table")
year<- c(2008, 2009, 2010, 2009, 2010, 2011)
person<- c('John', 'John', 'John', 'Brian', 'Brian','Vickey')
location<- c('London','Paris', 'Newyork','Paris','Paris','Miami')
df<- data.frame(year, person, location)
setDT(df)[, firstPlace:=as.integer(min(year)==year), person]
# > setDT(df)[, firstPlace:=as.integer(min(year)==year), person]
# > df
#    year person location firstPlace
# 1: 2008   John   London          1
# 2: 2009   John    Paris          0
# 3: 2010   John  Newyork          0
# 4: 2009  Brian    Paris          1
# 5: 2010  Brian    Paris          0
# 6: 2011 Vickey    Miami          1

或(如@Frank所述)如果您的数据按人和年排序

setDT(df)[, firstPlace:=+!duplicated(person)]

或(此变体)

setDT(df)[, firstPlace:=+(rowidv(person)==1)]

答案 1 :(得分:2)

first_city<-df%>%
group_by(person)%>%
  arrange(year)%>%
  slice(1)

答案 2 :(得分:0)

Rectangle { x: 100 width: 30 height: 30 color: "red" MouseArea { hoverEnabled: true propagateComposedEvents: true anchors.fill: parent onMouseXChanged: console.log("red changed" + mouseX) } } Rectangle { x: 130 width: 30 height: 30 color: "green" MouseArea { hoverEnabled: true propagateComposedEvents: true anchors.fill: parent onMouseXChanged: console.log("green changed"+ mouseX) } } 的解决方案:

dplyr
  • df %>% group_by(person) %>% mutate(FirstPlace = +(location[which.min(year)] == location)) # A tibble: 6 x 4 # Groups: person [3] # year person location FirstPlace # <dbl> <fctr> <fctr> <int> #1 2008 John London 1 #2 2009 John Paris 0 #3 2010 John Newyork 0 #4 2009 Brian Paris 1 #5 2010 Brian Paris 1 #6 2011 Vickey Miami 1 找出第一个位置,然后将第一个位置与 location 列进行比较,并将布尔值结果转换为整数。

如果只看第一年:

location[which.min(year)]

答案 3 :(得分:0)

library(dplyr)
first_city <- df %>%
  group_by(person) %>%
  top_n(1, year)

或者作为数据中的额外列:

df %>% 
  group_by(person) %>% 
  arrange(year) %>% 
  mutate(first_city = head(location, 1))

指明第一个城市10其他人(仅限第一年)

df %>% 
  group_by(person) %>% 
  arrange(year) %>% 
  mutate(first_city = as.integer(head(location, 1) == location & year == min(year))) 

# A tibble: 6 x 4
# Groups:   person [3]
#    year person location first_city
#   <dbl> <fct>  <fct>         <int>
# 1  2008 John   London            1
# 2  2009 John   Paris             0
# 3  2009 Brian  Paris             1
# 4  2010 John   Newyork           0
# 5  2010 Brian  Paris             0
# 6  2011 Vickey Miami             1