如何找到最常见的因素水平?

时间:2016-11-01 17:28:54

标签: r

我仍在学习R,并因缺乏知识而道歉。

我的数据有192个国家/地区,与此类似:

    # Given some data which resemble the original data
    cars_produced <- data.frame(countries =  c("US", 
                                     "US",
                                     "US", 
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "US",
                                     "France",
                                     "France",
                                     "France",
                                     "France",
                                     "France",
                                     "France",
                                     "France",
                                     "France",
                                     "Norway",
                                     "Norway",
                                     "Norway",
                                     "Norway",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany",
                                     "Germany"
    ),
    manufacturer   =   c(  "Mercedes",
                           "Mercedes",
                           "Volkswagen",
                           "Volkswagen",
                           "Volkswagen",
                           "BMW",
                           "General motors",
                           "General motors",
                           "General motors",
                           "General motors",
                           "General motors",
                           "Ford",
                           "Ford",
                           "Ford",
                           "Toyota",
                           "Toyota",
                           "Toyota",
                           "Mercedes",
                           "Mercedes",
                           "Mercedes",
                           "Mercedes",
                           "BMW",
                           "BMW",
                           "BMW",
                           "Toyota",
                           "Volkswagen",
                           "Volkswagen",
                           "Volkswagen",
                           "Volkswagen",
                           "Volkswagen",
                           "BMW",
                           "BMW",
                           "BMW",
                           "BMW",
                           "Volkswagen",
                           "Volkswagen",
                           "Volkswagen",
                           "Volkswagen",
                           "Mercedes",
                           "Mercedes",
                           "Mercedes",
                           "Mercedes"

    ),

    model=c("GLK",
             "M",
             "Passat",
             "Golf",
             "Caddy",
             "M4",
             "Hammer",
             "Pontiac",
             "Chevrolet",
             "Corvette",
             "Cadillac",
             "KA",
             "Fiesta",
             "Taurus",
             "Yaris",
             "Carina",
             "Briska",
             "GLK",
             "M",
             "GL",
             "C",
             "M4",
             "X5",
             "i8",
             "Carina",
             "Passat",
             "Golf",
             "Caddy",
             "Sharan",
             "Polo",
             "M4",
             "X5",
             "i8",
              "E9",
             "Passat",
             "Golf",
             "Caddy",
             "Sharan",
             "GLK",
             "M",
             "GL",
             "C")
    )




    > cars_produced
    countries   manufacturer     model
    #1         US       Mercedes       GLK
    #2         US       Mercedes         M
    #3         US     Volkswagen    Passat
    #4         US     Volkswagen      Golf
    #5         US     Volkswagen     Caddy
    #6         US            BMW        M4
    #7         US General motors    Hammer
    #8         US General motors   Pontiac
    #9         US General motors Chevrolet
    #10        US General motors  Corvette
    #11        US General motors  Cadillac
    #12        US           Ford        KA
    #13        US           Ford    Fiesta
    #14        US           Ford    Taurus
    #15        US         Toyota     Yaris
    #16    France         Toyota    Carina
    #17    France         Toyota    Briska
    #18    France       Mercedes       GLK
    #19    France       Mercedes         M
    #20    France       Mercedes        GL
    #21    France       Mercedes         C
    #22    France            BMW        M4
    #23    France            BMW        X5
    #24    Norway            BMW        i8
    #25    Norway         Toyota    Carina
    #26    Norway     Volkswagen    Passat
    #27    Norway     Volkswagen      Golf
    #28   Germany     Volkswagen     Caddy
    #29   Germany     Volkswagen    Sharan
    #30   Germany     Volkswagen      Polo
    #31   Germany            BMW        M4
    #32   Germany            BMW        X5
    #33   Germany            BMW        i8
    #34   Germany            BMW        E9
    #35   Germany     Volkswagen    Passat
    #36   Germany     Volkswagen      Golf
    #37   Germany     Volkswagen     Caddy
    #38   Germany     Volkswagen    Sharan
    #39   Germany       Mercedes       GLK
    #40   Germany       Mercedes         M
    #41   Germany       Mercedes        GL
    #42   Germany       Mercedes         C        

我的问题是:

  1. 各国(制造商)通常生产多少种车型?

    1. 如何选择世界上最受欢迎且最不受欢迎的车型(与相应的制造商合作)?
  2. 在这方面,我试图使用

        library(dplyr)
    

    对于问题一,我尝试过以下方法:

        count_by_manufacturer<- cars_produced[,-1] %>% group_by(manufacturer) %>% summarise(count = n())
    

    最瞳孔。但是我不知道如何获得相应的制造商:

        Countries_by_models<- cars_produced[,-2] %>% group_by(model) %>% summarise(count = n())
    

3 个答案:

答案 0 :(得分:1)

以下内容可能会有所帮助:

countries <- table(cars_produced$countries)
sort(countries, T)
Germany      US  France  Norway 
     15      15       8       4    

只是指出来:

country_manufac <- with(cars_produced, table(countries, manufacturer ))
country_manufac
         manufacturer
countries BMW Ford General motors Mercedes Toyota Volkswagen
  France    2    0              0        4      2          0
  Germany   4    0              0        4      0          7
  Norway    1    0              0        0      1          2
  US        1    3              5        2      1          3

如果这样过于冗长,请尝试

apply(country_manufac, 1, which.max)
 France Germany  Norway      US 
      4       6       6       3 

每个国家/地区都为您提供最受欢迎汽车品牌的索引。例如,法国喜欢4号汽车品牌,即梅赛德斯。但请熟悉有关系时会发生什么。一个好的起点是看?which.min。您可能还想查看?ftable

答案 1 :(得分:0)

至于你的第一个问题:

ag <- aggregate(model~countries+manufacturer, df, length)
ag[order(ag$countries),] # just in case you want to see them sorted

   # countries   manufacturer model
# 1     France            BMW     2
# 7     France       Mercedes     4
# 10    France         Toyota     2
# 2    Germany            BMW     4
# 8    Germany       Mercedes     4
# 13   Germany     Volkswagen     7
# 3     Norway            BMW     1
# 11    Norway         Toyota     1
# 14    Norway     Volkswagen     2
# 4         US            BMW     1
# 5         US           Ford     3
# 6         US General motors     5
# 9         US       Mercedes     2
# 12        US         Toyota     1
# 15        US     Volkswagen     3

或等效(作为交叉表格):

table(df$countries, df$manufacturer)

         # BMW Ford General motors Mercedes Toyota Volkswagen
  # France    2    0              0        4      2          0
  # Germany   4    0              0        4      0          7
  # Norway    1    0              0        0      1          2
  # US        1    3              5        2      1          3

这为您提供了每个国家/地区的模型数量:

aggregate(model~countries, df, function(x) length(unique(x)))

  # countries model
# 1    France     8
# 2   Germany    13
# 3    Norway     4
# 4        US    15

你的第二个问题不明确。大多数/最不受欢迎的是什么意思?任何关系怎么样?

答案 2 :(得分:0)

您可以使用dplyr生成所需的结果。

对于第一个和第二个结果,我们不需要取消选择未分组的列。相反,要查找由count制作的countries模型,group_by countriessummarise

library(dplyr)
cars_produced %>% group_by(countries) %>% summarise(count=n())
### A tibble: 4 x 2
##  countries count
##     <fctr> <int>
##1    France     8
##2   Germany    15
##3    Norway     4
##4        US    15

要按countmanufacturer个模型,group_by manufacturer

cars_produced %>% group_by(manufacturer) %>% summarise(count=n())
### A tibble: 6 x 2
##    manufacturer count
##          <fctr> <int>
##1            BMW     8
##2           Ford     3
##3 General motors     5
##4       Mercedes    10
##5         Toyota     4
##6     Volkswagen    12

要查找最受欢迎的model(及其manufacturer),请先group_by model,然后按model创建一个包含计数的列。然后,ungroupfilter仅保留max(count)行。最后,group_by manufacturermodel以及summarise count

cars_produced %>% group_by(model) %>% mutate(count=n()) %>% 
                  ungroup %>% filter(count==max(count)) %>% 
                  group_by(manufacturer, model) %>% summarise(count=first(count))
##Source: local data frame [6 x 3]
##Groups: manufacturer [?]
##
##  manufacturer  model count
##        <fctr> <fctr> <int>
##1          BMW     M4     3
##2     Mercedes    GLK     3
##3     Mercedes      M     3
##4   Volkswagen  Caddy     3
##5   Volkswagen   Golf     3
##6   Volkswagen Passat     3

要找到最不受欢迎的模型,请执行filter以外的相同操作,以仅保留min(count)的行:

cars_produced %>% group_by(model) %>% mutate(count=n()) %>% 
                  ungroup %>% filter(count==min(count)) %>% 
                  group_by(manufacturer, model) %>% summarise(count=first(count))
##Source: local data frame [12 x 3]
##Groups: manufacturer [?]
##
##     manufacturer     model count
##           <fctr>    <fctr> <int>
##1             BMW        E9     1
##2            Ford    Fiesta     1
##3            Ford        KA     1
##4            Ford    Taurus     1
##5  General motors  Cadillac     1
##6  General motors Chevrolet     1
##7  General motors  Corvette     1
##8  General motors    Hammer     1
##9  General motors   Pontiac     1
##10         Toyota    Briska     1
##11         Toyota     Yaris     1
##12     Volkswagen      Polo     1