根据位置,年份和人名对变量进行分组

时间:2016-08-28 20:02:16

标签: r dplyr plyr

我想创建一个列的集合

A<- c("xyz", "xyz", "xy", "xx","xx", "y")
year<- c(2009,2010,2009,2009,2010,2009)
location<- c('london', 'london', 'paris', 'newyork','mumbai','sydney')
df<- data.frame(A, year, location)

我想创建一个名为'yearsofexperience'的变量,它将总结一个人在给定位置所花费的总年数。

   A     year         location  yearsofexperience
   xyz  2009          london     2
   xyz  2010          london     2
   xy   2009          paris      1
   xx   2009          newyork    1
   xx   2010          mumbai     1
   y    2009          sydeny     1

有人可以帮忙吗?

4 个答案:

答案 0 :(得分:4)

如果有人对此感兴趣,那么使用data.table的(可称为更整洁的)解决方案在大数据集上应该更快。

require(data.table)
setDT(df)[, yearsofexperience := .N, by = .(A, location)]
df
     A year location yearsofexperience
1: xyz 2009   london                 2
2: xyz 2010   london                 2
3:  xy 2009    paris                 1
4:  xx 2009  newyork                 1
5:  xx 2010   mumbai                 1
6:   y 2009   sydney                 1

答案 1 :(得分:3)

使用dplyr您可以使用group_bymutate来获取您在问题中列出的输出

library(dplyr)
df %>% 
  group_by(A, location) %>% 
  mutate(yearsofexperience = n()) %>% 
  ungroup()

如果要折叠给定A&amp;的条目location您可以使用summarise代替mutate语句。这将删除year变量。

df %>% 
  group_by(A, location) %>% 
  summarise(yearsofexperience = n()) %>% 
  ungroup()

答案 2 :(得分:2)

您可以使用n_distinct()计算每个人与位置组合的唯一年份。这应该适合你:

library(dplyr)
df %>% group_by(A, location) %>% mutate(yoe = n_distinct(year))

# Source: local data frame [6 x 4]
# Groups: A, location [5]

#       A  year location   yoe
#  <fctr> <dbl>   <fctr> <int>
#1    xyz  2009   london     2
#2    xyz  2010   london     2
#3     xy  2009    paris     1
#4     xx  2009  newyork     1
#5     xx  2010   mumbai     1
#6      y  2009   sydney     1

您还可以使用data.table语法,相应的函数为uniqueN()

library(data.table)
setDT(df)[, yoe := uniqueN(year), .(A, location)]

答案 3 :(得分:1)

我们可以使用ave

中的base R
df$yearsofexperience <- with(df, ave(year, location, A, FUN = length))
df
#     A year location yearsofexperience
#1 xyz 2009   london                 2
#2 xyz 2010   london                 2
#3  xy 2009    paris                 1
#4  xx 2009  newyork                 1
#5  xx 2010   mumbai                 1
#6   y 2009   sydney                 1

如果这是基于lengthunique元素

df$yearsofexperience <- with(df, ave(year, location, A, FUN = function(x) length(unique(x))))