收集前两行

时间:2018-04-20 17:00:36

标签: r tidyr reshape2

我必须使用一些格式不正确的数据。它在前两行中包含两个标识符,后跟数据。数据如下:

     V1       V2       V3
1  Date 12/16/18 12/17/18
2 Equip        a        b
3    x1        1        2
4    x2        3        4
5    x3        5        6

我希望gather数据使其变得整洁,但只有当您拥有单个列名时,才会收集数据。我也尝试过传播。我提出的唯一解决方案是非常hacky并且感觉不对。有没有一种优雅的方式来解决这个问题?

这就是我想要的:

      Date Equip metric value
1 12/16/18     a     x1     1
2 12/16/18     a     x2     3
3 12/16/18     a     x3     5
4 12/17/18     b     x1     2
5 12/17/18     b     x2     4
6 12/17/18     b     x3     6

这种方法让我接近,但我不知道如何处理糟糕的格式(没有标题,没有行名称)。如果格式正确,则gather应该很容易。

> as.data.frame(t(df))
         V1    V2 V3 V4 V5
V1     Date Equip x1 x2 x3
V2 12/16/18     a  1  3  5
V3 12/17/18     b  2  4  6

这是dput

structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c("12/16/18", 
"a", "1", "3", "5"), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c("V1", 
"V2", "V3"), row.names = c(NA, -5L))

3 个答案:

答案 0 :(得分:6)

感谢您发布一个可重现性很好的问题。这是一些温和的tidyr / dplyr按摩。

library(tidyverse)

df <- structure(
    list(
        V1 = c("Date", "Equip", "x1", "x2", "x3"), 
        V2 = c("12/16/18", "a", "1", "3", "5"), 
        V3 = c("12/17/18", "b", "2", "4", "6")
    ), 
    class = "data.frame", 
    .Names = c("V1", "V2", "V3"), 
    row.names = c(NA, -5L)
)

df %>%
    gather(key = measure, value = value, -V1) %>%
    spread(key = V1, value = value) %>%
    select(-measure) %>%
    gather(key = metric, value = value, x1:x3) %>%
    arrange(Date, Equip, metric)
#>       Date Equip metric value
#> 1 12/16/18     a     x1     1
#> 2 12/16/18     a     x2     3
#> 3 12/16/18     a     x3     5
#> 4 12/17/18     b     x1     2
#> 5 12/17/18     b     x2     4
#> 6 12/17/18     b     x3     6

reprex package(v0.2.0)创建于2018-04-20。

答案 1 :(得分:2)

您可以使用reshape

library(reshape)
row.names(df) = df$V1
df$V1 = NULL
df = melt(data.frame(t(df)),id.var = c('Date','Equip'))
df[order(df$Date),]
      Date Equip variable value
1 12/16/18     a       x1     1
3 12/16/18     a       x2     3
5 12/16/18     a       x3     5
2 12/17/18     b       x1     2
4 12/17/18     b       x2     4
6 12/17/18     b       x3     6

答案 2 :(得分:1)

这是使用gather从您的方法开始的另一种方式。我们可以替换第一行中的标题,然后删除第一行,只允许一个library(tidyverse) df <- structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c( "12/16/18", "a", "1", "3", "5" ), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c( "V1", "V2", "V3" ), row.names = c(NA, -5L)) df %>% t() %>% `colnames<-`(.[1, ]) %>% `[`(-1, ) %>% as_tibble() %>% gather("metric", "value", x1:x3) %>% arrange(Date, Equip, metric) #> # A tibble: 6 x 4 #> Date Equip metric value #> <chr> <chr> <chr> <chr> #> 1 12/16/18 a x1 1 #> 2 12/16/18 a x2 3 #> 3 12/16/18 a x3 5 #> 4 12/17/18 b x1 2 #> 5 12/17/18 b x2 4 #> 6 12/17/18 b x3 6 ,这可能更直观。

const a = [1, 2, 3];
const b = [4, 5, 6];

function doCartesian(a, b) {
    // logic here
}

console.log(doCartesian(a, b))
// prints out
{
    "firstWay": {
        "1": {
            "4": true,
            "5": true,
            "6": true
        },
        "2": {
            "4": true,
            "5": true,
            "6": true
        },
        "3": {
            "4": true,
            "5": true,
            "6": true
        }
    },
    "secondWay": {
        "4": {
            "1": true,
            "2": true,
            "3": true
        },
        "5": {
            "1": true,
            "2": true,
            "3": true
        },
        "6": {
            "1": true,
            "2": true,
            "3": true
        }
    }
}

reprex package(v0.2.0)创建于2018-04-20。