由dbplyr

时间:2019-10-03 01:05:07

标签: r dplyr dbplyr

之所以出现这个问题,是因为我希望为自己的方便起一个作用:

as.numeric_psql <- function(x) {

   return(as.numeric(as.integer(x)))
}

将远程postgres表中的布尔值转换为数字。需要转换为整数的步骤是:

  

在数字和布尔值之间没有定义直接转换。您可以将整数用作中间值。 (https://stackoverflow.com/a/19290671/2109289

当然,此功能可以在本地按预期工作:

copy_to(con_psql, cars, 'tmp_cars')

tmp_cars_sdf <-
    tbl(con_psql, 'tmp_cars')


tmp_cars_sdf %>%
    mutate(low_dist = dist < 5) %>%
    mutate(low_dist = as.numeric(as.integer(low_dist)))

# # Source:   lazy query [?? x 3]
# # Database: postgres 9.5.3
#     speed  dist low_dist
#     <dbl> <dbl>    <dbl>
# 1     4     2        1
# 2     4    10        0
# 3     7     4        1
# 4     7    22        0
# 5     8    16        0

cars %>%
    mutate(low_dist = dist < 5) %>%
    mutate(low_dist = as.numeric_psql(low_dist)) %>%
    head(5)

#   speed dist low_dist
# 1     4    2        1
# 2     4   10        0
# 3     7    4        1
# 4     7   22        0
# 5     8   16        0

但是,由于as.numeric_psql不在sql转换列表中,因此在远程数据帧上使用时不起作用,因此将其逐字传递给查询:

> tmp_cars_sdf %>%
+     mutate(low_dist = dist < 5) %>%
+     mutate(low_dist = as.numeric_psql(low_dist))
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  syntax error at or near "as"
LINE 1: SELECT "speed", "dist", as.numeric_psql("low_dist") AS "low_...
                                ^
)

我的问题是,是否存在一种使dplyr理解函数as.numeric_psql的简单方法(即未定义自定义sql转换),该函数由具有现有sql转换的函数组成,并使用这些转换代替。

1 个答案:

答案 0 :(得分:1)

避免错误的一种方法是将函数设置为在数据帧上运行,而不是在内部mutate上运行。例如:

import java.util.ArrayList;

public class LinearList<T> {
    private static int SIZE = 10;
    private int n = 0;
    private final ArrayList<T> newList = new ArrayList<T>(SIZE);
    private T t;

    public void set(T t) {
        this.t = t;
    }

    public T get() {
        return t;
    }

    public void add(T value, int position) {
        newList.add(position, value);
        n++;
    }

    public void addFirst(T value) {
        newList.add(0, value);
        n++;
    }

    public void removeLast() {
        T value = null;
        for (int i = 0; i < newList.size(); i++)
            value = newList.get(i);
        newList.remove(value);
        n--;
    }

    public void removeFirst() {
        newList.remove(0);
        n--;
    }

    public T first() {
        return newList.get(0);
    }

    public T last() {
        int value = 0;
        for (int i = 0; i < newList.size() - 1; i++)
            value++;
        return newList.get(value);
    }

    public int count() {
        return n;
    }

    public boolean isFull() {
        return (n >= SIZE);
    }

    public boolean isEmpty() {
        return (n <= 0);
    }

    //part 4
    public void Grow() {
        int grow = SIZE / 2;
        SIZE = SIZE + grow;
    }

    public void Shrink() {
        int grow = SIZE / 2;
        SIZE = SIZE - grow;
    }

    public String toString() {
        String outStr = "" + newList;
        return outStr;
    }
}

请注意,在您的示例中,数据库版本copy_to(con_psql, cars, 'tmp_cars') tmp_cars_sdf <- tbl(con_psql, 'tmp_cars') as.numeric_psql <- function(data, x) { return(data %>% mutate({{x}} := as.numeric(as.integer({{x}})))) } tmp_cars_sdf %>% mutate(low_dist = dist < 5) %>% as.numeric_psql(low_dist) #> # Source: lazy query [?? x 3] #> # Database: sqlite 3.30.1 [:memory:] #> speed dist low_dist #> <dbl> <dbl> <dbl> #> 1 4 2 1 #> 2 4 10 0 #> 3 7 4 1 #> 4 7 22 0 #> 5 8 16 0 #> 6 9 10 0 #> 7 10 18 0 #> 8 10 26 0 #> 9 10 34 0 #> 10 11 17 0 #> # … with more rows 在创建时已经被编码为整数,而不是像标准R数据框中那样被编码为逻辑:

low_dist