Question

假设我有一个看起来像这样的数据集

Country, Sold, Model
China, 100, Toyota
China, 200, Honda
China, 200, Suzuki
USA, 100, Tesla
USA, 50, Shevi
USA, 50, Lambo

我想得到类似的输出

China, Toyota[20%]; Honda[40%]; Suzuki[40%]
USA, Tesla[50%]; Shevi[25%]; Lambo[25%]

以便将数据按国家/地区分组，然后针对每个汽车模型在其名称旁边显示其销售份额。使用R可以实现吗？

Answer 1

编辑：很抱歉，这是超级hacky，但这是我所能做到的。我敢肯定有更好的方法，希望有人能尽快向您展示更好的方法。

library(dplyr)
df <- tribble(
  ~Country, ~Sold, ~Model,
  "China", 100, "Toyota",
  "China", 200, "Honda",
  "China", 200, "Suzuki",
  "USA", 100, "Tesla",
  "USA", 50, "Shevi",
  "USA", 50, "Lambo")
)

model_by_country <- df %>% 
  group_by(Country, Model) %>% 
  summarize(Total_Sold = sum(Sold)) %>% 
  group_by(Country) %>% 
  mutate(Percent_Sold = Total_Sold / sum(Total_Sold)) %>% 
  select(-Total_Sold) %>% 
  ungroup()
model_by_country

##   Country Model  Percent_Sold
##   <chr>   <chr>         <dbl>
## 1 China   Honda          0.4 
## 2 China   Suzuki         0.4 
## 3 China   Toyota         0.2 
## 4 USA     Lambo          0.25
## 5 USA     Shevi          0.25
## 6 USA     Tesla          0.5 

# EDITS begin here
format_country_per <- function(country) {
  model_by_country %>% 
    filter(Country == country) %>% 
    mutate(Model_Percent_Sold = paste0(Model, "[", 100 * Percent_Sold, "%]")) %>% 
    .$Model_Percent_Sold %>% 
    paste(., collapse = "; ") %>% 
    paste(country, ., sep = ", ")
}

format_country_per("China")
## [1] "China, Honda[40%]; Suzuki[40%]; Toyota[20%]"
format_country_per("USA")
## [1] "USA, Lambo[25%]; Shevi[25%]; Tesla[50%]"

Answer 2

似乎需要按国家（地区）和型号列出的表格的行百分比。这给出了包含这两个因素的所有可能组合的表格：

100*prop.table(                # multiply proportions to get percentages
   with(dat, tapply(Sold, list(Country,Model), sum, default=0)),  #apply sum in categories
                    1)         # the "1" indicates these should be row proportions
       Honda  Lambo  Shevi  Suzuki  Tesla  Toyota
China     40      0      0      40      0      20
USA        0     25     25       0     50       0

复杂的分组和计数

2 个答案: