使用Tidyr重塑数据

时间:2018-10-29 22:15:34

标签: r reshape tidyr spread

我的数据格式为:

[Test]
public void StockItem_Should_Cancel_When_PurchaseOrder_Cancelled() {
    //Arrange
    var item = new StockItem();
    var purchaseOrder = new PurchaseOrder() {
        StockItems = new List<StockItem> { 
            item
        }
    };

    //Act
    purchaseOrder.Cancel();

    //Assert
    item.Status.Should().Be(StockItemStatus.Cancelled);
}

其中h1和h2是对样本“ a”的高度的重复测量,h3和h4是对样本“ b”的高度的重复测量,等等。

我需要并行进行重复测量:

  sample  height  width  weight
1 a       h1      w1     p1    
2 a       h2      w2     p2    
3 b       h3      w3     p3    
4 b       h4      w4     p4

我一直在摆弄 sample height1 height2 width1 width2 weight1 weight2 1 a h1 h2 w1 w2 p1 p2 2 b h3 h4 w3 w4 p3 p4 gather,但没有得到想要的东西。有什么帮助吗?

谢谢!

数据

spread

2 个答案:

答案 0 :(得分:1)

在按组创建序列列之后,我们可以gather转换为“长”格式,然后spread恢复为“宽”格式

library(tidyverse)
df1 %>%
  gather(key, val, height:weight) %>% 
  group_by(sample, key) %>% 
  mutate(n = row_number()) %>%
  unite(keyn, key, n, sep="") %>% 
  spread(keyn, val)
# A tibble: 2 x 7
# Groups:   sample [2]
#   sample height1 height2 weight1 weight2 width1 width2
#  <chr>  <chr>   <chr>   <chr>   <chr>   <chr>  <chr> 
#1 a      h1      h2      p1      p2      w1     w2    
#2 b      h3      h4      p3      p4      w3     w4    

或带有tidyverse

的另一个选项
df1 %>%
    group_by(sample) %>%
    nest %>% 
    mutate(data = map(data, ~ 
                       unlist(.x) %>% 
                       as.list %>%
                       as_tibble)) %>% 
    unnest

或者我们可以使用reshape中的base R

df1$ind <- with(df1, ave(seq_along(sample), sample, FUN = seq_along))
reshape(df1, idvar= c("sample"), timevar = "ind", direction = "wide")
#   sample height.1 width.1 weight.1 height.2 width.2 weight.2
#1      a       h1      w1       p1       h2      w2       p2
#3      b       h3      w3       p3       h4      w4       p4

数据

df1 <- structure(list(sample = c("a", "a", "b", "b"), height = c("h1", 
 "h2", "h3", "h4"), width = c("w1", "w2", "w3", "w4"), weight = c("p1", 
 "p2", "p3", "p4")), class = "data.frame", row.names = c(NA, -4L
  ))

答案 1 :(得分:1)

尽管您要求tidyr::spread,但我还是使用data.table的{​​{1}}

dcast