ggplot2:替换图例键并按字母顺序排序

时间:2016-04-28 22:30:30

标签: r ggplot2

我用ggplot2(Hadley Wikham)用两种语言制作了一套图表。我可以通过重命名原始数据集中的变量,在两个单独的工作流程中生成它们。相反,我希望修改一个ggplot对象:我希望首先生成英文图形,然后将标签翻译成法文。 我应该如何更改ggplot object内的图例键?然后我该如何对图例键进行排序?

我正在探索这种方法的原因是我希望我的情节颜色和符号在英语和法语中是相同的,同时按字母顺序排列图例键。问题是法语和英语的传奇键没有相同的字母顺序(西班牙语与Espagne)。比较从 MWE 获得的图例键:图例键在英文图例中按字母顺序排序,但在法语图例中输入错误。

enter image description here

替换xlabylabggtitle以及修改轴标签的样式(例如数字格式)非常简单,所以我的重点是传奇键和他们在传奇中列出的顺序。

MWE ,其中包含许多名称,以说明在以下方法中多次复制名称的繁琐程度(一次到group,另一次为colour,并且再次为shape等):

    df <- structure(list(year = c("2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007"), country = c("Australia", "Australia", 
    "Austria", "Austria", "Belgium", "Belgium", "Canada", "Canada", 
    "Denmark", "Denmark", "Finland", "Finland", "France", "France", 
    "Germany", "Germany", "Greece", "Greece", "Italy", "Italy", "Japan", 
    "Japan", "Netherlands", "Netherlands", "New Zealand", "New Zealand", 
    "Norway", "Norway", "Portugal", "Portugal", "Spain", "Spain", 
    "Sweden", "Sweden", "Switzerland", "Switzerland", "United Kingdom", 
    "United Kingdom", "United States", "United States"), value = c(33, 
    33, 33, 33, 30, 30, 34, 34, 30, 30, 33, 33, 28, 29, 27, 27, 40, 
    39, 35, 35, 35, 35, 27, 27, 33, 33, 27, 27, 37, 37, 32, 32, 31, 
    31, 32, 31, 32, 32, 33, 33)), .Names = c("year", "country", "value"
    ), row.names = c(NA, -40L), class = "data.frame")

    library("ggplot2")
    ggplot(data = df, aes(x = year, y = value, group = country, colour = country)) + 
        geom_line(size = 0.5) + geom_point(size = 1)
    ggsave(last_plot(), file = "stackoverflow-1.png")

    ggplot(data = df, aes(x = year, y = value, group = factor(country, labels = c("Australie", "Autriche", "Belgique", "Canada", "Danemark", "Finlande", "France", "Allemagne", "Grèce", "Italie", "Japon", "Pays-Bas", "Nouvelle-Zélande", "Norvège", "Portugal", "Espagne", "Suède", "Suisse", "Royaume-Uni", "États-Unis")), colour = factor(country, labels = c("Australie", "Autriche", "Belgique", "Canada", "Danemark", "Finlande", "France", "Allemagne", "Grèce", "Italie", "Japon", "Pays-Bas", "Nouvelle-Zélande", "Norvège", "Portugal", "Espagne", "Suède", "Suisse", "Royaume-Uni", "États-Unis")))) + geom_line(size = 0.5) + geom_point(size = 1) + theme(legend.title = element_blank())
    ggsave(last_plot(), file = "stackoverflow-2.png")

如果我只使用变量的一个子集(示例中的国家/地区),我希望有一个不会破坏的方法。最方便,更不容易出错的是定义这样的映射:

list("A Cuckoo Land" = "Un Pays Idyllique", # This mapping is not used
 "Australia" = "Australie", 
 "Austria" = "Autriche", 
 "Belgium" = "Belgique", 
 "Canada" = "Canada",
 "Denmark" = "Danemark", 
 "Finland" = "Finlande", 
 "France" = "France", 
 "Germany" = "Allemagne", 
 "Greece" = "Grèce", 
 "Italy" = "Italie", 
 "Japan" = "Japon", 
 "Netherlands" = "Pays-Bas", 
 "New Zealand" = "Nouvelle-Zélande", 
 "Norway" = "Norvège", 
 "Portugal" = "Portugal", 
 "Spain" = "Espagne", 
 "Sweden" = "Suède", 
 "Switzerland" = "Suisse", 
 "United Kingdom" = "Royaume-Uni", 
 "United States" = "États-Unis")

并在图例键中替换左侧每次出现的左侧。 (如果该方法可以处理三种语言方法,甚至更好,例如像"Belgium" = c("Belgique", "Bélgica")这样的映射。

2 个答案:

答案 0 :(得分:1)

我实际上可能通过创建具有相同列名但具有不同语言的国家/地区名称的数据框列表来实现此目的。如果有很多数据框,那么创建数据框列表可能会有点工作,但我相当肯定它会比使用grobs和gtables更加麻烦。一个例子:

key <- unlist(list("A Cuckoo Land" = "Un Pays Idyllique", # This mapping is not used
                   "Australia" = "Australie", 
                   "Austria" = "Autriche", 
                   "Belgium" = "Belgique", 
                   "Canada" = "Canada",
                   "Denmark" = "Danemark", 
                   "Finland" = "Finlande", 
                   "France" = "France", 
                   "Germany" = "Allemagne", 
                   "Greece" = "Grèce", 
                   "Italy" = "Italie", 
                   "Japan" = "Japon", 
                   "Netherlands" = "Pays-Bas", 
                   "New Zealand" = "Nouvelle-Zélande", 
                   "Norway" = "Norvège", 
                   "Portugal" = "Portugal", 
                   "Spain" = "Espagne", 
                   "Sweden" = "Suède", 
                   "Switzerland" = "Suisse", 
                   "United Kingdom" = "Royaume-Uni", 
                   "United States" = "États-Unis"))
df_eng <- df
df_fra <- df
df_fra$country <- unlist(key[df_eng$country])

dfs <- list('english' = df_eng,'french' = df_fra)

library("ggplot2")
#Now you can create one "default" plot...
p <- ggplot(data = dfs[['english']], 
            aes(x = year, y = value, 
                group = country, colour = country)) + 
  geom_line(size = 0.5) + 
  geom_point(size = 1)
print(p)

#And simply swap out the data frame...
p %+% dfs[['french']]

答案 1 :(得分:0)

在回答我自己的问题时,我想详细介绍基于朱兰答案的进一步调整,以便进行记录和/或进一步讨论。

总而言之,目的是:在两组图形中生成具有一致颜色,形状,线型等的2种语言的图形集。困难在于ggplot级别的命令,但是级别的标签在2种语言中具有不同的字母顺序,例如:一个期望&#34;西班牙&#34;以英文列出到列表的末尾,因为它以字母S开头,但在法语的开头附近,如#34; Espagne&#34;以字母E开头。

在下文中,我创建了一个country因子,标签用英文写成,并按照英文字母顺序排序,country.fr因子标签用法文写成,并按法文字母顺序排序订购。相同的逻辑将适用于形状,线型,填充等。我的代码是一点点,各种快捷方式无疑是可能的。

    ### Create a fixed assignment for colors, shapes, linetypes, etc.
    ### The same for both the English and French versions
    ### Data
    df <- structure(list(year = c("2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007"), country = c("Australia", "Australia", 
    "Austria", "Austria", "Belgium", "Belgium", "Canada", "Canada", 
    "Denmark", "Denmark", "Finland", "Finland", "France", "France", 
    "Germany", "Germany", "Greece", "Greece", "Italy", "Italy", "Japan", 
    "Japan", "Netherlands", "Netherlands", "New Zealand", "New Zealand", 
    "Norway", "Norway", "Portugal", "Portugal", "Spain", "Spain", 
    "Sweden", "Sweden", "Switzerland", "Switzerland", "United Kingdom", 
    "United Kingdom", "United States", "United States"), value = c(33, 
    33, 33, 33, 30, 30, 34, 34, 30, 30, 33, 33, 28, 29, 27, 27, 40, 
    39, 35, 35, 35, 35, 27, 27, 33, 33, 27, 27, 37, 37, 32, 32, 31, 
    31, 32, 31, 32, 32, 33, 33)), .Names = c("year", "country", "value"
    ), row.names = c(NA, -40L), class = "data.frame")

    ## Create a unique country ID and a language map
    key <- read.table(textConnection("
    AUS,Australia,Australie
    AUT,Austria,Autriche
    BEL,Belgium,Belgique
    CAN,Canada,Canada
    CHE,Switzerland,Suisse
    DEU,Germany,Allemagne
    DNK,Denmark,Danemark
    ESP,Spain,Espagne
    FIN,Finland,Finlande
    FRA,France,France
    GBR,United Kingdom,Royaume-Uni
    GRC,Greece,Grèce
    ITA,Italy,Italie
    JPN,Japan,Japon
    NLD,Netherlands,Pays-Bas
    NZL,New Zealand,Nouvelle-Zélande
    NOR,Norway,Norvège
    PRT,Portugal,Portugal
    SWE,Sweden,Suède
    USA,United States,États-Unis"), 
    sep = ',', stringsAsFactors = FALSE)
    names(key) <- c('country.code', 'country.name', 'country.name.fr')
    ##  Check the types of data
    ##  ! Make sure country is a 'string' not a 'factor' !
    ##  ! otherwise, the 'translation' will be incorrect !
    str(key)
    ##'data.frame': 20 obs. of  3 variables:
    ## $ country.code   : chr  "         AUS" "         AUT" "         BEL" "         CAN" ...
    ## $ country.name   : chr  " Australia" " Austria" " Belgium" " Canada" ...
    ## $ country.name.fr: chr  " Australie" " Autriche" " Belgique" " Canada" ...

    ## Create a unique code variable for each country
    df$country.code <- NA
    matched <- match(df$country, key$country.name)
    df$country.code <- ifelse(is.na(matched), df$country, key$country.code[matched])

    ## translate country name with translation key
    df$country.fr <- NA
    matched <- match(df$country, key$country.name)
    df$country.fr <- ifelse(is.na(matched), NA, key$country.name.fr[matched])

    ## Set the country names to be factors (they are currently strings)
    ## function as.factor orders alphabetically
    # English
    df$country <- as.factor(df$country)
    View(df)
    # French
    df$country.fr <- as.factor(df$country.fr)
    View(df)

    ## Define some colors (here manually combining Set1 and Set3 of RColorBrewer)
    ## The Palette could also have been embedded in the key dataframe earlier...
    colorPalette <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#FFFF33", "#A65628", "#F781BF", "#999999","#8DD3C7", "#FFFFB3", "#BEBADA", "#FB8072", "#80B1D3", "#FDB462", "#B3DE69", "#FCCDE5", "#D9D9D9", "#BC80BD", "#CCEBC5", "#FFED6F")
    length(colorPalette)  # Make sure we have enough colors
    ## [1] 21

    ## Set the colors to each country within the dataframe
    ## There is no need for that, but I felt it was idiot-proof
    names(colorPalette) <- levels(df$country)
    df$colors <- NA
    matched <- match(df$country, names(colorPalette))
    df$colors <- ifelse(is.na(matched), NA, colorPalette[matched])
    ##'data.frame': 40 obs. of  6 variables:
    ## $ year        : chr  "2006" "2007" "2006" "2007" ...
    ## $ country     : Factor w/ 20 levels "Australia","Austria",..: 1 1 2 2 3 3 4 4 5 5 ...
    ## $ value       : num  33 33 33 33 30 30 34 34 30 30 ...
    ## $ country.code: chr  "AUS" "AUS" "AUT" "AUT" ...
    ## $ country.fr  : Factor w/ 20 levels "Allemagne","Australie",..: 2 2 3 3 4 4 5 5 6 6 ...
    ## $ colors      : chr  "#E41A1C" "#E41A1C" "#377EB8" "#377EB8" ...

    ### Make the English plot
    ##  use the country factor to order variables
    library("ggplot2")
    p <- ggplot(data = df, aes(x = year, y = value, 
                    group = country, colour = country)) + 
      geom_line(size = 0.5) + 
      geom_point(size = 1) +
      guides(colour = guide_legend(ncol = 2))
    p

    ### Swap out the colors with custom scheme using scale_colour_manual
    ## To ensure correct mapping, use named vectors in scale_colour_manual
    colors <- df$colors
    names(colors) <- df$country
    str(colors)
    ## Named chr [1:40] "#E41A1C" "#E41A1C" "#377EB8" ...
    ## - attr(*, "names")= chr [1:40] "Australia" "Australia" "Austria" "Austria" ...

    p + scale_colour_manual(name = "country", values = colors)

    ### Make the French plot
    ##  use the country.fr factor to order variables
    colors.fr <- df$colors
    names(colors.fr) <- df$country.fr
    str(colors.fr)
    ##Named chr [1:40] "#E41A1C" "#E41A1C" "#377EB8" ...
    ## - attr(*, "names")= chr [1:40] "Australie" "Australie" "Autriche" "Autriche" ...
    p <- ggplot(data = df, aes(x = year, y = value, 
                    group = country.fr, colour = country.fr)) + 
      geom_line(size = 0.5) + 
      geom_point(size = 1) +
      guides(colour = guide_legend(ncol = 2))
    p

    p + scale_colour_manual(name = "pays", values = colors.fr)

这里相应的传说并排:

enter image description here