让我们说我有两家公司的数据,每家公司都在扩展到新的城市。我在每个城市添加了每个额外商店的日期:
df <- data.frame(
firm = c(rep(1,4), rep(2,4)),
date = as.Date(c('2017-01-01', '2017-03-01', '2017-05-01',
'2017-06-01', '2017-02-01', '2017-04-01',
'2017-05-01', '2017-06-01')),
city = c('New York', 'DC', 'New York', 'Atlanta', 'DC', 'DC',
'Chicago', 'Atlanta'),
numStores = c(1, 1, 2, 1, 1, 2, 1, 1))
我想创建一个整数列,告诉我每个公司根据date
变量输入每个城市的顺序。结果应该是这样的:
df$cityOrder = c(1,2,1,3,1,1,2,3)
答案 0 :(得分:4)
使用dplyr
...
library(dplyr)
df %>% left_join(df %>% filter(numStores==1) %>%
group_by(firm) %>%
mutate(cityOrder=order(date)) %>%
select(-numStores,-date))
firm date city numStores cityOrder
1 1 2017-01-01 New York 1 1
2 1 2017-03-01 DC 1 2
3 1 2017-05-01 New York 2 1
4 1 2017-06-01 Atlanta 1 3
5 2 2017-02-01 DC 1 1
6 2 2017-04-01 DC 2 1
7 2 2017-05-01 Chicago 1 2
8 2 2017-06-01 Atlanta 1 3
答案 1 :(得分:2)
一种方式:
library(dplyr)
first_entry <- df %>%
group_by(firm, city) %>%
summarize(first = min(date)) %>%
group_by(firm) %>%
mutate(order = order(first)) %>%
select(-first)
left_join(df, first_entry)
firm date city numStores order
1 1 2017-01-01 New York 1 1
2 1 2017-03-01 DC 1 2
3 1 2017-05-01 New York 2 1
4 1 2017-06-01 Atlanta 1 3
5 2 2017-02-01 DC 1 1
6 2 2017-04-01 DC 2 1
7 2 2017-05-01 Chicago 1 2
8 2 2017-06-01 Atlanta 1 3
答案 2 :(得分:1)
您可以{/ 1}}使用基数R中的factors
:
tapply
(如果您的日期按照本例中的每个公司进行排序)
答案 3 :(得分:0)
dplyr解决方案。
library(dplyr)
# create a summary by firm by city
df_summary <- df %>%
# sequence by earliest date by firm and city
group_by(firm, city) %>%
summarize(min_date = min(date)) %>%
# number (using row_number) *within* the firm group, arranged by date
arrange(firm, min_date) %>%
group_by(firm) %>%
mutate(cityOrder = row_number()) %>%
# drop the date column
select(firm, city, cityOrder)
print(df_summary)
# A tibble: 6 x 3
# Groups: firm [2]
# firm city cityOrder
# <dbl> <fct> <int>
# 1 1. New York 1
# 2 1. DC 2
# 3 1. Atlanta 3
# 4 2. DC 1
# 5 2. Chicago 2
# 6 2. Atlanta 3
## join the summary city order to the original
df_new <- df %>%
left_join(df_summary, by = c(firm = "firm", city = "city"))
print (df_new$cityOrder)
# [1] 1 2 1 3 1 1 2 3