例如,我有需要根据以前的历史记录创建变量的数据
created<- c(2009,2010,2010,2011, 2012, 2011)
person <- c(A, A, A, A, B, B)
location<- c('London','Geneva', 'London', 'New York', 'London', 'London')
df <- data.frame (created, person, location)
我想创建一个名为“ existing”的变量,该变量考虑到前几年的情况,并查看他/她是否住过该地方,如果该地方老了(他们住在那里,则给出的值为0)。 ?
library(dplyr)
df %>% group_by(person) %>% mutate (existing=0)
existing<- c(1, 1, 0, 1, 0,1)
答案 0 :(得分:1)
另一个dplyr
选项可能是:
df %>%
group_by(person, location) %>%
mutate(existing = +(1:n() == 1))
created person location existing
<dbl> <fct> <fct> <int>
1 2009 A London 1
2 2010 A Geneva 1
3 2010 A London 0
4 2011 A New York 1
5 2012 B London 1
6 2011 B London 0
如果需要排序:
df %>%
group_by(person, location) %>%
arrange(created, .by_group = TRUE) %>%
mutate(existing = +(1:n() == 1))
答案 1 :(得分:0)
基于OP的更新信息,我们需要先按arrange
和年份(person
)created
的数据,然后使用duplicated
。
library(dplyr)
df %>%
arrange(person, created) %>%
group_by(person) %>%
mutate(existing = +(!duplicated(location)))
# created person location existing
# <dbl> <fct> <fct> <int>
#1 2009 A London 1
#2 2010 A Geneva 1
#3 2010 A London 0
#4 2011 A New York 1
#5 2011 B London 1
#6 2012 B London 0
答案 2 :(得分:0)
您可以尝试
with(df, ave(location, person, FUN = function(i)as.integer(!duplicated(i))))
#[1] "1" "1" "0" "1" "1" "0"
答案 3 :(得分:0)
使用data.table
的另一个选项:
setDT(df)[order(person, created), existing := c(1L, rep(0L, .N-1L)), .(person, location)]
输出:
created person location existing
1: 2009 A London 1
2: 2010 A Geneva 1
3: 2010 A London 0
4: 2011 A New York 1
5: 2012 B London 0
6: 2011 B London 1