将多个标题表转换为长格式

时间:2018-10-12 18:46:38

标签: r excel flatten

我正在读取包含多行标题的Excel表,该表通过read.csv在 R 中创建这样的对象。

R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))

我现在该如何“展平”它,以便得到一个 R 表,其中包含“最小年龄”,“最大年龄”,“最小持续时间”,“最大持续时间”,“区域” “,”级别”,“价格”列。在“区域”列显示“ EU”或“美国”的情况下,“水平”列显示1、2或3,然后在“价格”列显示在Excel表中找到的相应价格?

如果没有多个标题行,我会使用tidyr的collect函数,但是似乎无法使用此数据,有什么想法吗?

输出应总共包含36行+标头

1 个答案:

答案 0 :(得分:3)

如果按照akrun的建议跳过第一行,则可能最终得到的数据看起来像这样:(R会自动添加“ X”和“ .1” /“。2”)< / p>

library(tidyverse)

df <- tribble(
    ~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration,  ~X1.1,  ~X2.1,  ~X3.1, ~X1.2, ~X2.2, ~X3.2,
    "18",     "21",           "1",           "3", "0.12", "0.32", "0.67",  "0.80",  "0.90",  "1.01",
    "22",     "25",           "1",           "3", "0.20", "0.40", "0.70",  "0.85",  "0.98",  "1.05",
    "26",     "30",           "1",           "3", "0.25", "0.50", "0.80",  "0.90",  "1.05",  "1.21",
    "18",     "21",           "4",           "5", "0.32", "0.60", "0.95",  "0.99",  "1.30",  "1.40",
    "22",     "25",           "4",           "5", "0.40", "0.70", "1.07",  "1.20",  "1.40",  "1.50",
    "26",     "30",           "4",           "5", "0.55", "0.80", "1.09",  "1.34",  "1.67",  "1.99"
)

有了这些数据,您就可以使用gather将以X开头的所有标头收集到一个列中并将价格收集到另一列中。您可以将标题的separate分别插入“级别”和“区域”。最后,重新编码Area并从级别中删除“ X”。

df %>% 
    gather(headers, Price, starts_with("X")) %>% 
    separate(headers, c("Level", "Area")) %>% 
    mutate(Area = if_else(Area == "1", "EU", "US"),
           Level = parse_number(Level))
#> # A tibble: 36 x 7
#>    Min.Age Max.Age Min.Duration Max.Duration Level Area  Price
#>    <chr>   <chr>   <chr>        <chr>        <dbl> <chr> <chr>
#>  1 18      21      1            3                1 EU    0.12 
#>  2 22      25      1            3                1 EU    0.20 
#>  3 26      30      1            3                1 EU    0.25 
#>  4 18      21      4            5                1 EU    0.32 
#>  5 22      25      4            5                1 EU    0.40 
#>  6 26      30      4            5                1 EU    0.55 
#>  7 18      21      1            3                2 EU    0.32 
#>  8 22      25      1            3                2 EU    0.40 
#>  9 26      30      1            3                2 EU    0.50 
#> 10 18      21      4            5                2 EU    0.60 
#> # ... with 26 more rows

reprex package(v0.2.1)于2018-10-12创建

P.S。您可以在此处找到许多电子表格消除工作流:https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html