通过分隔符sqldf解析

时间:2019-03-11 21:19:02

标签: r sqldf

我有一个如下数据框:

                 Col1     Col2
123,bnh12,1242,mdmdmd        8
0923,3mdn42,76,ieieie       10

如何使用逗号,解析此数据集,并在sqldf中获得如下所示的预期输出?

                 Col1     Col2    NewCol    NewCol2   
123,bnh12,1242,mdmdmd        8       123        123
0923,3mdn42,76,ieieie       10      0923         76

我能够获得NewCol的第一个号码,但无法弄清楚NewCol2:

df1 <- sqldf("SELECT *, SUBSTR([Col1], 1, INSTR([Col1],',')-1) [NewCol] FROM df")

2 个答案:

答案 0 :(得分:1)

对于NewCol1,请使用问题中的代码;对于NewCol2,请使用strFilter,以删除所有非逗号或数字的字符。然后从两端修剪数字,然后从两端修剪逗号。然后从左侧修剪更多的数字,然后从左侧修剪逗号。

library(sqldf)

sqldf("select *,
 SUBSTR(Col1, 1, INSTR([Col1], ',') - 1) NewCol1,
 ltrim(ltrim(trim(trim(strFilter(Col1, ',0123456789'), '0123456789'), ','), 
   '0123456789'), ',') NewCol2
 from df")

给予:

                   Col1 Col2 NewCol1 NewCol2
1 123,bnh12,1242,mdmdmd    8     123    1242
2 0923,3mdn42,76,ieieie   10    0923      76

h2数据库

以上使用默认的RSQLite后端,但是如果使用RH2后端,则可以使用更多的字符串操作函数:

library(sqldf)
library(RH2)  # sqldf will notice this is loaded and use it

sqldf("SELECT *, 
       regexp_replace(Col1, ',.*', '') NewCol1,
       regexp_replace(Col1, '^[^,]*,[^,]*,|,[^,]*$', '') NewCol2
       FROM df")

答案 1 :(得分:0)

df <- sqldf("SELECT *, SUBSTR([Col1], 1, INSTR([Col1],',')-1) [NewCol] FROM df")

df<- sqldf("SELECT *, replace([Col1], [NewCol], '') [Removal of NewCol] from df")

df <- sqldf("select *, substr([Removal of NewCol], 2) as [Removal of NewCol without comma] from df")

df <- sqldf("SELECT *, SUBSTR([Removal of NewCol without comma], 1, INSTR([Removal of NewCol without comma],',')-1) [Middle_UnImportant] FROM df")

df <- sqldf("SELECT *, replace([Removal of NewCol without comma], [Middle_UnImportant], '') [Anything After] from df")

df <- sqldf("select *, substr([Anything After], 2) as [Anything After without comma] from df")

df <- sqldf("SELECT *, SUBSTR([Anything After without comma], 1, INSTR([Anything After without comma],',')-1) [NewCol2] FROM df")