在SQL

时间:2017-04-04 12:55:33

标签: r dataframe

我是R的初学者。有人可以帮我解决如何在R中完成以下工作。

我已将R连接到redshift(AWS)数据库,我正在对红移表执行某些操作。

从源表订单我创建了一个数据框,其中包含所有可能的组合列表,如何放置不同的订单。我有一个id列列出了唯一的组合(它是jst行号,因为每一行都有一个独特的组合)

包含以下值的数据框:

amt  order_time  order_day  hour_day table_no  item_grp     id
  2      1             2       14       16         1        1
  1      2             1       18        12        2        2

总的来说,数据框中包含1500个行条目(意味着1500种可能的组合)

我希望这个数据框充当包含order_id的sql表名序的查找表

订单表

order_id amt order_time order_day hour_day table_no item_grp
123 2 1 2 14 16 1
321 2 1 2 14 16 1
456 1 2 1 18 12 2

如何将数据框中的值传递给where条件下的sql语句 就像读取我的数据框的每一行,从订单表中获取满足所需条件的值,并按下面列出的格式列出行

输出表格如下:

order_id amt order_time order_day hour_day table_no item_grp id 123 2 1 2 14 16 1 1 321 2 1 2 14 16 1 1 456 1 2 1 18 12 2 2

等等......

1 个答案:

答案 0 :(得分:0)

这是一个解决方案。它使用dplyr包中的left_join()方法进行数据帧操作。

有关详情,请参阅dplyr文档:https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

library(dplyr)         # %>% , left_join()
library(purrr)         # map_df() to remove factors from structure()

#sample data
order_details <-
        dput(
                structure(
                        list(
                                order_id = structure(1:3, .Label = c("123", "321",
                                                                     "456"), class = "factor"),
                                amt = structure(c(2L, 2L, 1L), .Label = c("1",
                                                                          "2"), class = "factor"),
                                order_time = structure(c(1L, 1L, 2L), .Label = c("1",
                                                                                 "2"), class = "factor"),
                                order_day = structure(c(2L, 2L, 1L), .Label = c("1",
                                                                                "2"), class = "factor"),
                                hour_day = structure(c(1L, 1L, 2L), .Label = c("14",
                                                                               "18"), class = "factor"),
                                table_no = structure(c(2L, 2L, 1L), .Label = c("12",
                                                                               "16"), class = "factor"),
                                item_grp = structure(c(1L, 1L, 2L), .Label = c("1",
                                                                               "2"), class = "factor")
                        ),
                        .Names = c(
                                "order_id",
                                "amt",
                                "order_time",
                                "order_day",
                                "hour_day",
                                "table_no",
                                "item_grp"
                        ),
                        row.names = c(NA,
                                      -3L), class = "data.frame"))

order_details <- purrr::map_df(purrr::map_df(order_details, as.character), as.integer)

#sample data contd.
orders <-
        dput(structure(
                list(
                        amt = c(2L, 1L),
                        order_time = 1:2,
                        order_day = c(2L,
                                      1L),
                        hour_day = c(14L, 18L),
                        table_no = c(16L, 12L),
                        item_grp = 1:2,
                        id = 1:2
                ),
                .Names = c(
                        "amt",
                        "order_time",
                        "order_day",
                        "hour_day",

                        "table_no",
                        "item_grp",
                        "id"
                ),
                row.names = c(NA,-2L),
                class = "data.frame"
        ))

# lookup order id
orders_augm <- orders %>%
        left_join(
                order_details,
                by = c(
                        "amt",
                        "order_time",
                        "order_day",
                        "hour_day",
                        "table_no",
                        "item_grp"
                )
        )

结果:

orders_augm
# A tibble: 3 × 8
    amt order_time order_day hour_day table_no item_grp    id order_id
  <int>      <int>     <int>    <int>    <int>    <int> <int>    <int>
1     2          1         2       14       16        1     1      123
2     2          1         2       14       16        1     1      321
3     1          2         1       18       12        2     2      456

重新排序的列:

orders_augm %>% 
        select(order_id, amt, 
               order_time, order_day, hour_day, 
               table_no, item_grp,    id )

结果

# A tibble: 3 × 8
  order_id   amt order_time order_day hour_day table_no item_grp    id
     <int> <int>      <int>     <int>    <int>    <int>    <int> <int>
1      123     2          1         2       14       16        1     1
2      321     2          1         2       14       16        1     1
3      456     1          2         1       18       12        2     2