将单行表转换为长数据框

时间:2019-03-06 20:58:29

标签: r dplyr data.table

我有一张只有1行的宽桌子。每列都有不同的名称。我想合并3列以形成1个“日期”列,然后转换数据以创建一个长表。我的数据表名称也会有所不同。例如,我可能有一个表只有2个“ ernMvx”,而另一个表有20个“ ernMvx”,所以我正在使用grep。

#data
dput(x)
structure(list(ernDate1 = "1/29/2019", ernDate2 = "11/1/2018", 
    ernDate3 = "7/31/2018", ernMv1 = 6.8335, ernMv2 = -6.6331, 
    ernMv3 = 5.891, ernStraPct1 = 6.8304, ernStraPct2 = 7.074, 
    ernStraPct3 = 5.2632), row.names = c(NA, -1L), class = "data.frame")

print(x)
ernDate1  ernDate2  ernDate3 ernMv1  ernMv2 ernMv3 ernStraPct1 ernStraPct2 ernStraPct3
1 1/29/2019 11/1/2018 7/31/2018 6.8335 -6.6331  5.891      6.8304       7.074      5.2632

date = x %>% select(grep("ernDate", names(x)))
ernMv = x %>% select(grep("ernMv",names(x)))
ernStraPct = x%>% select(grep("ernStra",names(x)))

new.data = as.data.frame(cbind(unlist(date), unlist(ernMv), unlist(ernStraPct)))
names(new.data) = c("date", "ernMv", "ernStraPct")
rownames(new.data) = c(1:length(new.data$date))
print(new.data)
             date   ernMv ernStraPct
      1 1/29/2019  6.8335     6.8304
      2 11/1/2018 -6.6331      7.074
      3 7/31/2018   5.891     5.2632

这是我想要的输出,但是看起来非常乏味。我尝试使用reshape2 :: melt,但是我似乎很难在1行表中使用它。谢谢

2 个答案:

答案 0 :(得分:3)

这是一个快速的data.table选项,它利用patterns函数来匹配列名

library(data.table)
melt(
    as.data.table(x),
    measure = patterns("ernDate", "ernMv", "ernStraPct"),
    value.name = c("date", "ernMv", "ernStraPct"))
#   variable      date   ernMv ernStraPct
#1:        1 1/29/2019  6.8335     6.8304
#2:        2 11/1/2018 -6.6331     7.0740
#3:        3 7/31/2018  5.8910     5.2632 

或更简洁(感谢@markus)

cols <- unique(sub("\\d$", "", names(x)))
melt(as.data.table(x), measure.vars = patterns(cols), value.name = cols)

或者tidyverse选项使用积极的前瞻性将separate条目输入列名和列号

library(tidyverse)
x %>%
    gather(k, v) %>%
    separate(k, c("col", "row"), sep = "(?=\\d)") %>%
    spread(col, v)
#  row   ernDate   ernMv ernStraPct
#1   1 1/29/2019  6.8335     6.8304
#2   2 11/1/2018 -6.6331      7.074
#3   3 7/31/2018   5.891     5.2632

答案 1 :(得分:1)

我假设每个列名都以一个可以解释的数字结尾,作为记录ID。

x %>%
  gather(name, value) %>%
  mutate(id = gsub('(.+)([0-9]+)', '\\2', name),
         name = gsub('(.+)([0-9]+)', '\\1', name)) %>%
  spread(name, value)