使用gather()将两组(或多组)列收集到两个(或更多)键值对中

时间:2017-04-08 12:13:27

标签: r reshape tidyr keyvaluepair

我想将两个独立的列组合成两个键值对。这是一些示例数据:

library(dplyr)
library(tidyr)
ID = c(1:5)
measure1 = c(1:5)
measure2 = c(6:10)
letter1 = c("a", "b", "c", "d", "e")
letter2 = c("f", "g", "h", "i", "j")

df = data.frame(ID, measure1, measure2, letter1, letter2)
df = tbl_df(df)
df$letter1 <- as.character(df$letter1)
df$letter2 <- as.character(df$letter2)

我希望两个度量列(measure1和measure2)的值位于一列中,其旁边有一个键列(键值对)。我也希望letter1和letter2也一样。我想我可以使用select()来创建两个不同的数据集,在两个数据集上单独使用聚集然后加入(这有效):

df_measure = df %>% 
  select(ID, measure1, measure2) %>% 
  gather(measure_time, measure, -ID) %>% 
  mutate(id.extra = c(1:10))
df_letter = df %>% 
  select(ID, letter1, letter2) %>% 
  gather(letter_time, letter, -ID) %>% 
  mutate(id.extra = c(1:10))
df_long = df_measure %>% 
  left_join(df_letter, by = "id.extra")

所以这完全有效(在这种情况下),但我想这可以更优雅地完成(没有分裂或创建'id.extra'之类的东西)。所以请详细说明它!

2 个答案:

答案 0 :(得分:3)

您可以使用以下内容。我不确定你当前的方法是否正好是你想要的输出,因为它似乎包含很多冗余信息。

df %>%
  gather(val, var, -ID) %>%
  extract(val, c("value", "time"), regex = "([a-z]+)([0-9]+)") %>%
  spread(value, var)
# # A tibble: 10 × 4
#       ID  time letter measure
# *  <int> <chr>  <chr>   <chr>
# 1      1     1      a       1
# 2      1     2      f       6
# 3      2     1      b       2
# 4      2     2      g       7
# 5      3     1      c       3
# 6      3     2      h       8
# 7      4     1      d       4
# 8      4     2      i       9
# 9      5     1      e       5
# 10     5     2      j      10

使用来自&#34; data.table&#34;的melt + patterns可以轻松完成此操作:

library(data.table)
melt(as.data.table(df), measure.vars = patterns("measure", "letter"))

或者你可以老去,只使用基地R的reshape。但是,请注意,基地R reshape不喜欢&#34; tibbles&#34;,所以你必须用as.data.frame)转换它。

reshape(as.data.frame(df), direction = "long", idvar = "ID", 
        varying = 2:ncol(df), sep = "")

答案 1 :(得分:1)

我们可以使用melt中的data.tablemeasure patterns library(data.table) melt(setDT(df), measure = patterns("^measure", "^letter"), value.name = c("measure", "letter")) # ID variable measure letter # 1: 1 1 1 a # 2: 2 1 2 b # 3: 3 1 3 c # 4: 4 1 4 d # 5: 5 1 5 e # 6: 1 2 6 f # 7: 2 2 7 g # 8: 3 2 8 h # 9: 4 2 9 i #10: 5 2 10 j

git clone https://github.com/cudamat/cudamat.git