我仍在学习R,并因缺乏知识而道歉。
我的数据有192个国家/地区,与此类似:
# Given some data which resemble the original data
cars_produced <- data.frame(countries = c("US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"US",
"France",
"France",
"France",
"France",
"France",
"France",
"France",
"France",
"Norway",
"Norway",
"Norway",
"Norway",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany",
"Germany"
),
manufacturer = c( "Mercedes",
"Mercedes",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"BMW",
"General motors",
"General motors",
"General motors",
"General motors",
"General motors",
"Ford",
"Ford",
"Ford",
"Toyota",
"Toyota",
"Toyota",
"Mercedes",
"Mercedes",
"Mercedes",
"Mercedes",
"BMW",
"BMW",
"BMW",
"Toyota",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"BMW",
"BMW",
"BMW",
"BMW",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Volkswagen",
"Mercedes",
"Mercedes",
"Mercedes",
"Mercedes"
),
model=c("GLK",
"M",
"Passat",
"Golf",
"Caddy",
"M4",
"Hammer",
"Pontiac",
"Chevrolet",
"Corvette",
"Cadillac",
"KA",
"Fiesta",
"Taurus",
"Yaris",
"Carina",
"Briska",
"GLK",
"M",
"GL",
"C",
"M4",
"X5",
"i8",
"Carina",
"Passat",
"Golf",
"Caddy",
"Sharan",
"Polo",
"M4",
"X5",
"i8",
"E9",
"Passat",
"Golf",
"Caddy",
"Sharan",
"GLK",
"M",
"GL",
"C")
)
> cars_produced
countries manufacturer model
#1 US Mercedes GLK
#2 US Mercedes M
#3 US Volkswagen Passat
#4 US Volkswagen Golf
#5 US Volkswagen Caddy
#6 US BMW M4
#7 US General motors Hammer
#8 US General motors Pontiac
#9 US General motors Chevrolet
#10 US General motors Corvette
#11 US General motors Cadillac
#12 US Ford KA
#13 US Ford Fiesta
#14 US Ford Taurus
#15 US Toyota Yaris
#16 France Toyota Carina
#17 France Toyota Briska
#18 France Mercedes GLK
#19 France Mercedes M
#20 France Mercedes GL
#21 France Mercedes C
#22 France BMW M4
#23 France BMW X5
#24 Norway BMW i8
#25 Norway Toyota Carina
#26 Norway Volkswagen Passat
#27 Norway Volkswagen Golf
#28 Germany Volkswagen Caddy
#29 Germany Volkswagen Sharan
#30 Germany Volkswagen Polo
#31 Germany BMW M4
#32 Germany BMW X5
#33 Germany BMW i8
#34 Germany BMW E9
#35 Germany Volkswagen Passat
#36 Germany Volkswagen Golf
#37 Germany Volkswagen Caddy
#38 Germany Volkswagen Sharan
#39 Germany Mercedes GLK
#40 Germany Mercedes M
#41 Germany Mercedes GL
#42 Germany Mercedes C
我的问题是:
各国(制造商)通常生产多少种车型?
在这方面,我试图使用
library(dplyr)
对于问题一,我尝试过以下方法:
count_by_manufacturer<- cars_produced[,-1] %>% group_by(manufacturer) %>% summarise(count = n())
最瞳孔。但是我不知道如何获得相应的制造商:
Countries_by_models<- cars_produced[,-2] %>% group_by(model) %>% summarise(count = n())
答案 0 :(得分:1)
以下内容可能会有所帮助:
countries <- table(cars_produced$countries)
sort(countries, T)
Germany US France Norway
15 15 8 4
只是指出来:
country_manufac <- with(cars_produced, table(countries, manufacturer ))
country_manufac
manufacturer
countries BMW Ford General motors Mercedes Toyota Volkswagen
France 2 0 0 4 2 0
Germany 4 0 0 4 0 7
Norway 1 0 0 0 1 2
US 1 3 5 2 1 3
如果这样过于冗长,请尝试
apply(country_manufac, 1, which.max)
France Germany Norway US
4 6 6 3
每个国家/地区都为您提供最受欢迎汽车品牌的索引。例如,法国喜欢4号汽车品牌,即梅赛德斯。但请熟悉有关系时会发生什么。一个好的起点是看?which.min
。您可能还想查看?ftable
。
答案 1 :(得分:0)
至于你的第一个问题:
ag <- aggregate(model~countries+manufacturer, df, length)
ag[order(ag$countries),] # just in case you want to see them sorted
# countries manufacturer model
# 1 France BMW 2
# 7 France Mercedes 4
# 10 France Toyota 2
# 2 Germany BMW 4
# 8 Germany Mercedes 4
# 13 Germany Volkswagen 7
# 3 Norway BMW 1
# 11 Norway Toyota 1
# 14 Norway Volkswagen 2
# 4 US BMW 1
# 5 US Ford 3
# 6 US General motors 5
# 9 US Mercedes 2
# 12 US Toyota 1
# 15 US Volkswagen 3
或等效(作为交叉表格):
table(df$countries, df$manufacturer)
# BMW Ford General motors Mercedes Toyota Volkswagen
# France 2 0 0 4 2 0
# Germany 4 0 0 4 0 7
# Norway 1 0 0 0 1 2
# US 1 3 5 2 1 3
这为您提供了每个国家/地区的模型数量:
aggregate(model~countries, df, function(x) length(unique(x)))
# countries model
# 1 France 8
# 2 Germany 13
# 3 Norway 4
# 4 US 15
你的第二个问题不明确。大多数/最不受欢迎的是什么意思?任何关系怎么样?
答案 2 :(得分:0)
您可以使用dplyr
生成所需的结果。
对于第一个和第二个结果,我们不需要取消选择未分组的列。相反,要查找由count
制作的countries
模型,group_by
countries
和summarise
:
library(dplyr)
cars_produced %>% group_by(countries) %>% summarise(count=n())
### A tibble: 4 x 2
## countries count
## <fctr> <int>
##1 France 8
##2 Germany 15
##3 Norway 4
##4 US 15
要按count
找manufacturer
个模型,group_by
manufacturer
:
cars_produced %>% group_by(manufacturer) %>% summarise(count=n())
### A tibble: 6 x 2
## manufacturer count
## <fctr> <int>
##1 BMW 8
##2 Ford 3
##3 General motors 5
##4 Mercedes 10
##5 Toyota 4
##6 Volkswagen 12
要查找最受欢迎的model
(及其manufacturer
),请先group_by
model
,然后按model
创建一个包含计数的列。然后,ungroup
和filter
仅保留max(count)
行。最后,group_by
manufacturer
和model
以及summarise
count
:
cars_produced %>% group_by(model) %>% mutate(count=n()) %>%
ungroup %>% filter(count==max(count)) %>%
group_by(manufacturer, model) %>% summarise(count=first(count))
##Source: local data frame [6 x 3]
##Groups: manufacturer [?]
##
## manufacturer model count
## <fctr> <fctr> <int>
##1 BMW M4 3
##2 Mercedes GLK 3
##3 Mercedes M 3
##4 Volkswagen Caddy 3
##5 Volkswagen Golf 3
##6 Volkswagen Passat 3
要找到最不受欢迎的模型,请执行filter
以外的相同操作,以仅保留min(count)
的行:
cars_produced %>% group_by(model) %>% mutate(count=n()) %>%
ungroup %>% filter(count==min(count)) %>%
group_by(manufacturer, model) %>% summarise(count=first(count))
##Source: local data frame [12 x 3]
##Groups: manufacturer [?]
##
## manufacturer model count
## <fctr> <fctr> <int>
##1 BMW E9 1
##2 Ford Fiesta 1
##3 Ford KA 1
##4 Ford Taurus 1
##5 General motors Cadillac 1
##6 General motors Chevrolet 1
##7 General motors Corvette 1
##8 General motors Hammer 1
##9 General motors Pontiac 1
##10 Toyota Briska 1
##11 Toyota Yaris 1
##12 Volkswagen Polo 1