根据R中另一个变量的值顺序创建一个新变量

时间:2018-03-31 22:33:42

标签: r dataframe

让我们说我有两家公司的数据,每家公司都在扩展到新的城市。我在每个城市添加了每个额外商店的日期:

df <- data.frame(
    firm = c(rep(1,4), rep(2,4)),
    date = as.Date(c('2017-01-01', '2017-03-01', '2017-05-01',
                     '2017-06-01', '2017-02-01', '2017-04-01',
                     '2017-05-01', '2017-06-01')),
    city = c('New York', 'DC', 'New York', 'Atlanta', 'DC', 'DC',
             'Chicago', 'Atlanta'),
    numStores = c(1, 1, 2, 1, 1, 2, 1, 1))

我想创建一个整数列,告诉我每个公司根据date变量输入每个城市的顺序。结果应该是这样的:

df$cityOrder = c(1,2,1,3,1,1,2,3)

4 个答案:

答案 0 :(得分:4)

使用dplyr ...

library(dplyr)
df %>% left_join(df %>% filter(numStores==1) %>% 
                        group_by(firm) %>% 
                        mutate(cityOrder=order(date)) %>% 
                        select(-numStores,-date))

  firm       date     city numStores cityOrder
1    1 2017-01-01 New York         1         1
2    1 2017-03-01       DC         1         2
3    1 2017-05-01 New York         2         1
4    1 2017-06-01  Atlanta         1         3
5    2 2017-02-01       DC         1         1
6    2 2017-04-01       DC         2         1
7    2 2017-05-01  Chicago         1         2
8    2 2017-06-01  Atlanta         1         3

答案 1 :(得分:2)

一种方式:

library(dplyr)
first_entry <- df %>% 
      group_by(firm, city) %>% 
      summarize(first = min(date)) %>% 
      group_by(firm) %>% 
      mutate(order = order(first)) %>% 
      select(-first)

left_join(df, first_entry)

  firm       date     city numStores order
1    1 2017-01-01 New York         1     1
2    1 2017-03-01       DC         1     2
3    1 2017-05-01 New York         2     1
4    1 2017-06-01  Atlanta         1     3
5    2 2017-02-01       DC         1     1
6    2 2017-04-01       DC         2     1
7    2 2017-05-01  Chicago         1     2
8    2 2017-06-01  Atlanta         1     3

答案 2 :(得分:1)

您可以{/ 1}}使用基数R中的factors

tapply

(如果您的日期按照本例中的每个公司进行排序)

答案 3 :(得分:0)

dplyr解决方案。

library(dplyr)

# create a summary by firm by city 
df_summary <- df %>%
  # sequence by earliest date by firm and city
  group_by(firm, city) %>%
  summarize(min_date = min(date)) %>%
  # number (using row_number)  *within* the firm group, arranged by date
  arrange(firm, min_date) %>%
  group_by(firm) %>%
  mutate(cityOrder = row_number()) %>%
  # drop the date column
  select(firm, city, cityOrder)

print(df_summary)
# A tibble: 6 x 3
# Groups:   firm [2]
# firm city     cityOrder
# <dbl> <fct>        <int>
# 1    1. New York         1
# 2    1. DC               2
# 3    1. Atlanta          3
# 4    2. DC               1
# 5    2. Chicago          2
# 6    2. Atlanta          3

## join the summary city order to the original
df_new <- df %>%
  left_join(df_summary, by = c(firm = "firm", city = "city")) 
print (df_new$cityOrder) 
# [1] 1 2 1 3 1 1 2 3