Question

我有一个火车货运数据集。目前，在几年的时间里（超过100列），每个星期（各列）的每个公司（该行）列出了一个数字（每周载货量）。我想将其收集到仅两列中：日期和加载。

当前看起来像这样：

3/29/2017  4/5/2017  4/12/2017  4/19/2017
32.7       31.6      32.3       32.5
20.5       21.8      22.0       22.3
24.1       24.1      23.6       23.4
24.9       24.7      24.8       26.5

我正在寻找：

Date        Load
3/29/2017   32.7
3/29/2017   20.5
3/29/2017   24.1
3/29/2017   24.9
4/5/2017    31.6

我正在做以下的各种版本：

rail3 <- rail2 %>% 
  gather(`3/29/2017`:`1/24/2018`, key = "date", value = "loads")

当我这样做时，它会创建一个名为rail3的数据集，但并没有创建我想要的新列。它仅使数据集的长度是原来的44倍。它给了我以下信息：

Warning message:
attributes are not identical across measure variables;
they will be dropped

我认为这是因为日期列当前被编码为因素。但是我也不确定如何将100多个列从因子转换为数值。我尝试了以下方法和其他各种方法：

rail2["3/29/2017":"1/24/2018"] <- lapply(rail2["3/29/2017":"1/24/2018"], as.numeric)

这些都没有起作用。如果您有任何建议，请告诉我。谢谢！

Answer 1

这是解决方法：

rail2<-read.table(header=TRUE, text="3/29/2017  4/5/2017  4/12/2017  4/19/2017
32.7       31.6      32.3       32.5
20.5       21.8      22.0       22.3
24.1       24.1      23.6       23.4
24.9       24.7      24.8       26.5", check.names=FALSE)

library(tidyr)
rail3<-rail2 %>% gather(key="date", value="load")

rail3
#        date load
#1  3/29/2017 32.7
#2  3/29/2017 20.5
#3  3/29/2017 24.1
#4  3/29/2017 24.9
#5   4/5/2017 31.6
#6   4/5/2017 21.8
#7 ...

Answer 2

如果要避免在收集时发出警告，并希望在最终df中输出日期和数字，则可以执行以下操作：

library(tidyr)
library(hablar)

# Data from above but with factors
rail2<-read.table(header=TRUE, text="3/29/2017  4/5/2017  4/12/2017  4/19/2017
32.7       31.6      32.3       32.5
                  20.5       21.8      22.0       22.3
                  24.1       24.1      23.6       23.4
                  24.9       24.7      24.8       26.5", check.names=FALSE) %>% 
  as_tibble() %>% 
  convert(fct(everything()))

# Code
rail2 %>% 
  convert(num(everything())) %>% 
  gather("date", "load") %>% 
  convert(dte(date, .args = list(format = "%m/%d/%Y")))

礼物：

# A tibble: 16 x 2
   date        load
   <date>     <dbl>
 1 2017-03-29  32.7
 2 2017-03-29  20.5
 3 2017-03-29  24.1
 4 2017-03-29  24.9
 5 2017-04-05  31.6

收集当前处于因子形式的多个数据列

2 个答案: