我正在对存储在sqlserver中的一些大型表进行一些数据处理,创建索引有时会减少某些R脚本运行所需的时间。我尝试使用mutate
的{{1}}函数创建一个具有连续编号的新列(dplyr
),然后使用该idx
列作为索引。但mutate函数似乎不起作用,并不断给我这个错误:
idx
现在我正在做一些对我来说非常愚蠢的事情,以及绕过"绕过"上面的错误信息:
> tbl(channel,'tbl_iris') %>% mutate(idx=1:n())
Error in from:to : NA/NaN argument
In addition: Warning message:
In 1:n() : NAs introduced by coercion\
有没有更好的方法呢?谢谢!
我按照@Phil的建议尝试iris <- tbl(channel,'tbl_iris') %>%
collect %>%
mutate(idx=1:n())
try(db_drop_table(channel,'##iris'))
copy_to(channel,iris,'##iris',temporary=FALSE)
db_create_index(channel,'##iris',columns='idx')
,它无效并显示以下错误消息:
mutate(idx = row_number())
我尝试了@Moody_Mudskipper建议的方式,似乎有效
> tbl(channel,'##iris') %>%
+ mutate(idx=row_number())
Error: <SQL> 'SELECT TOP 10 "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", row_number() OVER () AS "idx"
FROM "##iris"'
nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]The function 'row_number' must have an OVER clause with ORDER BY.
> tbl(channel,'##iris') %>%
+ arrange(Species) %>%
+ mutate(idx=row_number())
Error: <SQL> 'SELECT TOP 10 "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", row_number() OVER (ORDER BY "Species") AS "idx"
FROM (SELECT *
FROM "##iris"
ORDER BY "Species") "kwtundzona"'
nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
我将尝试修改我的脚本,看看与之前更愚蠢的方法相比,这是否会提供类似的性能提升。
除了下面显示的错误消息外,我希望事情按计划进行。
> try(db_drop_table(channel,'##iris'))
[1] 0
> copy_to(channel,iris,'##iris',temporary=FALSE)
> tbl(channel,'##iris') %>% head(.,1)
# Source: lazy query [?? x 5]
# Database: Microsoft SQL Server 11.00.6251[dbo@WCDCHCMS9999\CMSAH_DC7_999/data_xx_yyy]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.10 3.50 1.40 0.200 setosa
>
> DBI::dbSendQuery(channel,"ALTER TABLE ##iris ADD idx INT IDENTITY(1,1) NOT NULL")
<OdbcResult>
SQL ALTER TABLE ##iris ADD idx INT IDENTITY(1,1) NOT NULL
ROWS Fetched: 0 [complete]
Changed: 0
> db_create_index(channel,'##iris',columns='idx')
[1] 0
Warning message:
In new_result(connection@ptr, statement) : Cancelling previous query
> tbl(channel,'##iris') %>% head(.,5)
# Source: lazy query [?? x 6]
# Database: Microsoft SQL Server 11.00.6251[dbo@WCDCHCMS9999\CMSAH_DC7_999/data_xx_yyy]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species idx
<dbl> <dbl> <dbl> <dbl> <chr> <int>
1 5.10 3.50 1.40 0.200 setosa 1
2 4.90 3.00 1.40 0.200 setosa 2
3 4.70 3.20 1.30 0.200 setosa 3
4 4.60 3.10 1.50 0.200 setosa 4
5 5.00 3.60 1.40 0.200 setosa 5
答案 0 :(得分:0)
据我所知,您无法使用dbplyr
向服务器端的现有表添加列,但对于这样的简单查询,使用DBI::dbSendQuery
来获得所需效果同样容易。以下行将创建一个id列:
DBI::dbSendQuery(channel, "ALTER TABLE tbl_iris ADD ID INT IDENTITY(1,1) NOT NULL")
然后,您可以使用dplyr::db_create_index
创建索引或发送另一个查询:
DBI::dbSendQuery(channel, "CREATE INDEX id ON tbl_iris (id);")