假设我有一个看起来像这样的数据集
Country, Sold, Model
China, 100, Toyota
China, 200, Honda
China, 200, Suzuki
USA, 100, Tesla
USA, 50, Shevi
USA, 50, Lambo
我想得到类似的输出
China, Toyota[20%]; Honda[40%]; Suzuki[40%]
USA, Tesla[50%]; Shevi[25%]; Lambo[25%]
以便将数据按国家/地区分组,然后针对每个汽车模型在其名称旁边显示其销售份额。 使用R可以实现吗?
答案 0 :(得分:1)
编辑:很抱歉,这是超级hacky,但这是我所能做到的。我敢肯定有更好的方法,希望有人能尽快向您展示更好的方法。
library(dplyr)
df <- tribble(
~Country, ~Sold, ~Model,
"China", 100, "Toyota",
"China", 200, "Honda",
"China", 200, "Suzuki",
"USA", 100, "Tesla",
"USA", 50, "Shevi",
"USA", 50, "Lambo")
)
model_by_country <- df %>%
group_by(Country, Model) %>%
summarize(Total_Sold = sum(Sold)) %>%
group_by(Country) %>%
mutate(Percent_Sold = Total_Sold / sum(Total_Sold)) %>%
select(-Total_Sold) %>%
ungroup()
model_by_country
## Country Model Percent_Sold
## <chr> <chr> <dbl>
## 1 China Honda 0.4
## 2 China Suzuki 0.4
## 3 China Toyota 0.2
## 4 USA Lambo 0.25
## 5 USA Shevi 0.25
## 6 USA Tesla 0.5
# EDITS begin here
format_country_per <- function(country) {
model_by_country %>%
filter(Country == country) %>%
mutate(Model_Percent_Sold = paste0(Model, "[", 100 * Percent_Sold, "%]")) %>%
.$Model_Percent_Sold %>%
paste(., collapse = "; ") %>%
paste(country, ., sep = ", ")
}
format_country_per("China")
## [1] "China, Honda[40%]; Suzuki[40%]; Toyota[20%]"
format_country_per("USA")
## [1] "USA, Lambo[25%]; Shevi[25%]; Tesla[50%]"
答案 1 :(得分:0)
似乎需要按国家(地区)和型号列出的表格的行百分比。这给出了包含这两个因素的所有可能组合的表格:
100*prop.table( # multiply proportions to get percentages
with(dat, tapply(Sold, list(Country,Model), sum, default=0)), #apply sum in categories
1) # the "1" indicates these should be row proportions
Honda Lambo Shevi Suzuki Tesla Toyota
China 40 0 0 40 0 20
USA 0 25 25 0 50 0