我想创建一个列的集合
A<- c("xyz", "xyz", "xy", "xx","xx", "y")
year<- c(2009,2010,2009,2009,2010,2009)
location<- c('london', 'london', 'paris', 'newyork','mumbai','sydney')
df<- data.frame(A, year, location)
我想创建一个名为'yearsofexperience'的变量,它将总结一个人在给定位置所花费的总年数。
A year location yearsofexperience
xyz 2009 london 2
xyz 2010 london 2
xy 2009 paris 1
xx 2009 newyork 1
xx 2010 mumbai 1
y 2009 sydeny 1
有人可以帮忙吗?
答案 0 :(得分:4)
如果有人对此感兴趣,那么使用data.table
的(可称为更整洁的)解决方案在大数据集上应该更快。
require(data.table)
setDT(df)[, yearsofexperience := .N, by = .(A, location)]
df
A year location yearsofexperience
1: xyz 2009 london 2
2: xyz 2010 london 2
3: xy 2009 paris 1
4: xx 2009 newyork 1
5: xx 2010 mumbai 1
6: y 2009 sydney 1
答案 1 :(得分:3)
使用dplyr
您可以使用group_by
和mutate
来获取您在问题中列出的输出
library(dplyr)
df %>%
group_by(A, location) %>%
mutate(yearsofexperience = n()) %>%
ungroup()
如果要折叠给定A
&amp;的条目location
您可以使用summarise
代替mutate语句。这将删除year
变量。
df %>%
group_by(A, location) %>%
summarise(yearsofexperience = n()) %>%
ungroup()
答案 2 :(得分:2)
您可以使用n_distinct()
计算每个人与位置组合的唯一年份。这应该适合你:
library(dplyr)
df %>% group_by(A, location) %>% mutate(yoe = n_distinct(year))
# Source: local data frame [6 x 4]
# Groups: A, location [5]
# A year location yoe
# <fctr> <dbl> <fctr> <int>
#1 xyz 2009 london 2
#2 xyz 2010 london 2
#3 xy 2009 paris 1
#4 xx 2009 newyork 1
#5 xx 2010 mumbai 1
#6 y 2009 sydney 1
您还可以使用data.table
语法,相应的函数为uniqueN()
:
library(data.table)
setDT(df)[, yoe := uniqueN(year), .(A, location)]
答案 3 :(得分:1)
我们可以使用ave
base R
df$yearsofexperience <- with(df, ave(year, location, A, FUN = length))
df
# A year location yearsofexperience
#1 xyz 2009 london 2
#2 xyz 2010 london 2
#3 xy 2009 paris 1
#4 xx 2009 newyork 1
#5 xx 2010 mumbai 1
#6 y 2009 sydney 1
如果这是基于length
个unique
元素
df$yearsofexperience <- with(df, ave(year, location, A, FUN = function(x) length(unique(x))))