将多个列转换为唯一列并将其与其他列信息

时间:2017-07-03 12:58:15

标签: r match multiple-columns data-cleaning

我正在处理一个复杂的矩阵(对我很复杂......)

这是这样的:

      Invoice.1   Invoice.2   Invoice.3               mtime
1   21605000182 21605000183          NA 2017-01-16 19:51:33
2   21605000182 21605000183          NA 2017-01-16 19:51:33
3   21605000182 21605000183          NA 2017-01-16 19:51:33
4   21605000182 21605000183          NA 2017-01-16 19:51:33
5   21510000669 21602000125 21608000366 2017-01-20 13:28:36
6   21609000856          NA          NA 2017-01-20 13:28:36
7   21606000405 21608000354 21608000356 2017-01-20 13:28:36
8   21610000133          NA          NA 2017-01-20 13:28:36
9   21604000592 21605000604 21605000608 2017-01-20 13:28:36
10  21609001012          NA          NA 2017-01-20 13:28:36

我想将所有Invoice列转换为一个,以便清理“NA”并复制,但是尊重每个列的匹配与最后一列的日期,即声明的日期。

类似的东西:

      Invoice          mtime
1   21605000182 2017-01-16 19:51:33
2   21605000182 2017-01-16 19:51:33
3   21605000182 2017-01-16 19:51:33
4   21605000182 2017-01-16 19:51:33
5   21510000669 2017-01-20 13:28:36
6   21609000856 2017-01-20 13:28:36
7   21606000405 2017-01-20 13:28:36
8   21610000133 2017-01-20 13:28:36
9   21604000592 2017-01-20 13:28:36
10  21609001012 2017-01-20 13:28:36
11  21605000183 2017-01-16 19:51:33
12  21605000183 2017-01-16 19:51:33
13  21605000183 2017-01-16 19:51:33
14  21605000183 2017-01-16 19:51:33
15  21602000125 2017-01-20 13:28:36
16  21608000354 2017-01-20 13:28:36

2 个答案:

答案 0 :(得分:0)

使用data.table的示例:(应该比使用其他致敬更快)

DT <- data.table(Invoice.1 = 1:3, Invoice.2 = c(1L,4L,5L), mtime = 11:13)
DT

   Invoice.1 Invoice.2 mtime
1:         1         1    11
2:         2         4    12
3:         3         5    13

rez <- melt(DT, measure.vars = paste0("Invoice.", 1:2),
            value.name = "Invoice")
rez[, variable := NULL]
rez

   mtime Invoice
1:    11       1
2:    12       2
3:    13       3
4:    11       1
5:    12       4
6:    13       5

rez <- unique(rez)
rez

   mtime Invoice
1:    11       1
2:    12       2
3:    13       3
4:    12       4
5:    13       5

答案 1 :(得分:0)

使用gather包的tidyr功能可以满足您的需求。 gather会将data.frame从宽格式转换为长格式。

library(tidyr)
library(readr)

# Create a temp file to store the example data
data_file <- tempfile()

cat(
"Invoice.1,Invoice.2,Invoice.3,mtime
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21605000182,21605000183,NA,2017-01-16 19:51:33
21510000669,21602000125,21608000366,2017-01-20 13:28:36
21609000856,NA,NA,2017-01-20 13:28:36
21606000405,21608000354,21608000356,2017-01-20 13:28:36
21610000133,NA,NA,2017-01-20 13:28:36
21604000592,21605000604,21605000608,2017-01-20 13:28:36
21609001012,NA,NA,2017-01-20 13:28:36",
file = data_file,
append = FALSE)

# Read the data from the temp file into a data.frame called `invoices`
invoices <-
  readr::read_csv(file = data_file, col_types = "cccT")

# View the data
invoices
# # A tibble: 10 x 4
#      Invoice.1   Invoice.2   Invoice.3               mtime
#          <chr>       <chr>       <chr>              <dttm>
#  1 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  2 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  3 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  4 21605000182 21605000183        <NA> 2017-01-16 19:51:33
#  5 21510000669 21602000125 21608000366 2017-01-20 13:28:36
#  6 21609000856        <NA>        <NA> 2017-01-20 13:28:36
#  7 21606000405 21608000354 21608000356 2017-01-20 13:28:36
#  8 21610000133        <NA>        <NA> 2017-01-20 13:28:36
#  9 21604000592 21605000604 21605000608 2017-01-20 13:28:36
# 10 21609001012        <NA>        <NA> 2017-01-20 13:28:36

# use the gather function from the tidyr package to transform the data from the
# wide format to a long format.

tidyr::gather(invoices, key = key, value = Invoice, -mtime, na.rm = TRUE) %>% print(n = Inf)
# # A tibble: 20 x 3
#                  mtime       key     Invoice
#  *              <dttm>     <chr>       <chr>
#  1 2017-01-16 19:51:33 Invoice.1 21605000182
#  2 2017-01-16 19:51:33 Invoice.1 21605000182
#  3 2017-01-16 19:51:33 Invoice.1 21605000182
#  4 2017-01-16 19:51:33 Invoice.1 21605000182
#  5 2017-01-20 13:28:36 Invoice.1 21510000669
#  6 2017-01-20 13:28:36 Invoice.1 21609000856
#  7 2017-01-20 13:28:36 Invoice.1 21606000405
#  8 2017-01-20 13:28:36 Invoice.1 21610000133
#  9 2017-01-20 13:28:36 Invoice.1 21604000592
# 10 2017-01-20 13:28:36 Invoice.1 21609001012
# 11 2017-01-16 19:51:33 Invoice.2 21605000183
# 12 2017-01-16 19:51:33 Invoice.2 21605000183
# 13 2017-01-16 19:51:33 Invoice.2 21605000183
# 14 2017-01-16 19:51:33 Invoice.2 21605000183
# 15 2017-01-20 13:28:36 Invoice.2 21602000125
# 16 2017-01-20 13:28:36 Invoice.2 21608000354
# 17 2017-01-20 13:28:36 Invoice.2 21605000604
# 18 2017-01-20 13:28:36 Invoice.3 21608000366
# 19 2017-01-20 13:28:36 Invoice.3 21608000356
# 20 2017-01-20 13:28:36 Invoice.3 21605000608