我有一个如下数据框:
Col1 Col2
123,bnh12,1242,mdmdmd 8
0923,3mdn42,76,ieieie 10
如何使用逗号,
解析此数据集,并在sqldf中获得如下所示的预期输出?
Col1 Col2 NewCol NewCol2
123,bnh12,1242,mdmdmd 8 123 123
0923,3mdn42,76,ieieie 10 0923 76
我能够获得NewCol的第一个号码,但无法弄清楚NewCol2:
df1 <- sqldf("SELECT *, SUBSTR([Col1], 1, INSTR([Col1],',')-1) [NewCol] FROM df")
答案 0 :(得分:1)
对于NewCol1
,请使用问题中的代码;对于NewCol2
,请使用strFilter
,以删除所有非逗号或数字的字符。然后从两端修剪数字,然后从两端修剪逗号。然后从左侧修剪更多的数字,然后从左侧修剪逗号。
library(sqldf)
sqldf("select *,
SUBSTR(Col1, 1, INSTR([Col1], ',') - 1) NewCol1,
ltrim(ltrim(trim(trim(strFilter(Col1, ',0123456789'), '0123456789'), ','),
'0123456789'), ',') NewCol2
from df")
给予:
Col1 Col2 NewCol1 NewCol2
1 123,bnh12,1242,mdmdmd 8 123 1242
2 0923,3mdn42,76,ieieie 10 0923 76
以上使用默认的RSQLite后端,但是如果使用RH2后端,则可以使用更多的字符串操作函数:
library(sqldf)
library(RH2) # sqldf will notice this is loaded and use it
sqldf("SELECT *,
regexp_replace(Col1, ',.*', '') NewCol1,
regexp_replace(Col1, '^[^,]*,[^,]*,|,[^,]*$', '') NewCol2
FROM df")
答案 1 :(得分:0)
df <- sqldf("SELECT *, SUBSTR([Col1], 1, INSTR([Col1],',')-1) [NewCol] FROM df")
df<- sqldf("SELECT *, replace([Col1], [NewCol], '') [Removal of NewCol] from df")
df <- sqldf("select *, substr([Removal of NewCol], 2) as [Removal of NewCol without comma] from df")
df <- sqldf("SELECT *, SUBSTR([Removal of NewCol without comma], 1, INSTR([Removal of NewCol without comma],',')-1) [Middle_UnImportant] FROM df")
df <- sqldf("SELECT *, replace([Removal of NewCol without comma], [Middle_UnImportant], '') [Anything After] from df")
df <- sqldf("select *, substr([Anything After], 2) as [Anything After without comma] from df")
df <- sqldf("SELECT *, SUBSTR([Anything After without comma], 1, INSTR([Anything After without comma],',')-1) [NewCol2] FROM df")