Question

我在整理中。

我使用read_csv读取了多个CSV文件（所有列均相同）

df <- read_csv("data.csv")

获得一系列数据帧。经过一堆数据清理和计算后，我想合并所有数据框。

有一打几百行几百列的数据帧。一个最小的例子是

DF1
ID             name   costcentre start  stop  date
  <chr>          <chr>  <chr>      <time> <tim> <chr>    
1 R_3PMr4GblKPV~ Geo    Prizm      01:00  03:00 25/12/2019 
2 R_s6IDep6ZLpY~ Chevy  Malibu        NA     NA NA       
3 R_238DgbfO0hI~ Toyota Corolla    08:00  11:00 25/12/2019 


DF2
ID                  name   costcentre start stop   date
<chr>               <chr>  <chr>      <lgl> <time> <chr>
1 R_3PMr4GblKPV1OYd Geo    Prizm      NA       NA  NA   
2 R_s6IDep6ZLpYvUeR Chevy  Malibu     NA    03:00  12/12/2019
3 R_238DgbfO0hItPxZ Toyota Corolla    NA       NA  NA

根据我的清洁要求（开始== NA，停止！= NA），start中的某些NA必须为00:00。我可以在该单元格中输入零：

df <- within(df, start[is.na(df$start) & !is.na(df$stop)] <- 0)

这导致

DF1
ID             name   costcentre start  stop  date
  <chr>          <chr>  <chr>      <time> <tim> <chr>    
1 R_3PMr4GblKPV~ Geo    Prizm      01:00  03:00 25/12/2019 
2 R_s6IDep6ZLpY~ Chevy  Malibu        NA     NA NA       
3 R_238DgbfO0hI~ Toyota Corolla    08:00  11:00 25/12/2019 


DF2
ID                  name   costcentre start stop   date
<chr>               <chr>  <chr>      <dbl> <time> <chr>
1 R_3PMr4GblKPV1OYd Geo    Prizm      NA       NA  NA   
2 R_s6IDep6ZLpYvUeR Chevy  Malibu       0   03:00  12/12/2019
3 R_238DgbfO0hItPxZ Toyota Corolla    NA       NA  NA

合并时遇到问题，有时start是双重的（因为我做了一些替换），是合乎逻辑的（因为所有NA都没有替换），或者是时间（如果有）有时在原始数据读取中）

merged_df <- bind_rows(DF1, DF2,...)

给我一个错误Error: Column开始can't be converted from hms, difftime to numeric

如何强制起始列为time类型，以便合并数据？

Answer 1

我认为重要的是，看来时间类型的列开始和停止是基于< strong> hms 软件包。我想知道为什么/何时显示，因为以前我没有听说过此类。

如我所见，这些列实际上属于 hms 和 difftime 类。此类对象实际上不是以分钟（如打印的小标题所示）存储，而是以秒存储。如果通过View(df)查看数据，就会看到这一点。有趣的是，如果我们打印数据，变量类型将显示为时间。

要解决您的问题，必须将所有开始和停止列一致地转换为hms difftime列，如下例所示。

可重现的最小示例：

library(dplyr)
library(hms)

df1 <- tibble(id = 1:3, 
              start = as_hms(as.difftime(c(1*60,NA,8*60), units = "mins")),
              stop = as_hms(as.difftime(c(3*60,NA,11*60), units = "mins")))
df2 <- tibble(id = 4:6, 
              start = c(NA,NA,NA), 
              stop = as_hms(as.difftime(c(NA,3*60,NA), units = "mins")))

或更简单（但与问题中的打印方式略有不同）

df1 <- tibble(id = 1:3, 
              start = as_hms(c(1*60,NA,8*60)),
              stop = as_hms(c(3*60,NA,11*60)))
df2 <- tibble(id = 4:6, 
              start = c(NA,NA,NA), 
              stop = as_hms(c(NA,3*60,NA)))

解决问题：

class(df1$start) # In df1 start has class hms and difftime
class(df2$start) # In df2 start has class logical

# We set start=0 if stop is not missing and turn the whole column into an hms object
df2 <- df2 %>% mutate(start = new_hms(ifelse(!is.na(stop), 0, NA)))

# Now that column types are consistent across tibbles we can easily bind them together
df <- bind_rows(df1, df2)
df

如何强制将小标题数据框列从double转换为time？

1 个答案: