我希望基于“地区”创建一个新的数据框,并根据“年,“属性类型”以及其新旧来对每个地区中的计数进行分组。
我尝试了聚合函数,但是正在丢失其他变量的值。下面是数据集
Property.Type Old.New Town.City District County Date
1 D N BARKING BARKING AND DAGENHAM GREATER LONDON 2012
2 D Y BARKING BARKING AND DAGENHAM GREATER LONDON 2012
3 D N BARKING BARKING AND DAGENHAM GREATER LONDON 2012
4 D N DAGENHAM BARKING AND DAGENHAM GREATER LONDON 2012
5 D N DAGENHAM BARKING AND DAGENHAM GREATER LONDON 2012
我想重新排列数据,所以我将地区作为我的ID,并为每个类别使用不同的框架,例如:
by year
District 2012 2013 2014 2015
Barking 100 500 700 800
by Old.New and year
District New Old
Barking 50 70
by property type and year
District New2012 Old2012
Barking 50 70
答案 0 :(得分:0)
在没有完整数据帧的情况下,很难提供帮助,但是以下一些代码向您展示了如何使用tidyverse
库来聚合数据。
首先使用提供的数据重新创建一个数据框:
Property.Type <- c("D","D","D","D","D")
Old.New <- c("N","Y","N","N","N")
Town.City <- c("BARKING","BARKING","BARKING","DAGENHAM","DAGENHAM")
District <- c("BARKING AND DAGENHAM","BARKING AND DAGENHAM","BARKING AND DAGENHAM","BARKING AND DAGENHAM","BARKING AND DAGENHAM")
County <- c("GREATER LONDON","GREATER LONDON","GREATER LONDON","GREATER LONDON","GREATER LONDON")
Date <- c(2012,2012,2012,2012,2012)
df <- data.frame(Property.Type,Old.New,Town.City,District,County,Date)
然后通过一些列进行汇总:
> df %>% group_by(Town.City) %>% summarise(n = n())
# A tibble: 2 x 2
Town.City n
<fct> <int>
1 BARKING 3
2 DAGENHAM 2
>
> df %>% group_by(Date, Town.City) %>% summarise(n = n())
# A tibble: 2 x 3
# Groups: Date [?]
Date Town.City n
<dbl> <fct> <int>
1 2012 BARKING 3
2 2012 DAGENHAM 2
>
> df %>% group_by(Date, Town.City) %>% summarise(n = n())
# A tibble: 2 x 3
# Groups: Date [?]
Date Town.City n
<dbl> <fct> <int>
1 2012 BARKING 3
2 2012 DAGENHAM 2
>
> df %>% group_by(Property.Type, Date) %>% summarise(n = n())
# A tibble: 1 x 3
# Groups: Property.Type [?]
Property.Type Date n
<fct> <dbl> <int>
1 D 2012 5
要进一步参考,请遵循this link。