我正在尝试使用.CSV
函数将大型.Xdf
文件转换为rxImport()
文件,代码如下:
rxImport(inData = "/poc/revor/data/ext_roll36_chrg_vol.csv",
outFile = "/poc/revor/data/ext_roll36_chrg_vol.xdf",
overwrite = TRUE, rowsPerRead = 100000,
colClasses = c(SE_NO = "character",
HIER_ROLLUP_CD = "character",
CUR_MO_CT ="numeric",
CUR_MO_AM = "numeric",
AD_LINE_1_TX = "character",
AD_LINE_2_TX = "character",
SUBMIT_DT = "character",
UPDT_TS = "character"),
transforms = list(SUBMIT_DT = as.Date(SUBMIT_DT, format="%d%b%Y")))
但是这个文件包含许多记录,如:
0200001097,SS,625,236899.000,"KRAV MAGA WORLDWIDE, INC.","KRAV MAGA WORLDWIDE, INC.",01MAY2014,07JUN2014:01:08:57.000000
正如您可以看到列AD_LINE_1_TX
& AD_LINE_2_TX
在双引号内包含逗号。
我已尝试使用type = "text"
参数,但随后将第SE_NO
列为numeric
,即使其类型显示为character
。这是我希望以numeric
读取的所有character
字段的问题。
如果我使用transform
参数将列转换为character
:
transforms = list(SE_NO = as.character(as.numeric(SE_NO)))
然后SE_NO
列的值在从字符(指数表示)0200001097
到数字的转换中从0200001000
更改为2.000011e+08
。
那么有没有其他方法来抑制双引号内的逗号而不影响其他列?
如果需要进一步的信息,请告诉我。
答案 0 :(得分:0)
这应该可以满足您的需求......
input_file <- "/poc/revor/data/ext_roll36_chrg_vol.csv"
output_file <- "/poc/revor/data/ext_roll36_chrg_vol.xdf"
my_colInfo <- list(list(index = 1, type = "character", newName = "SE_NO"),
list(index = 2, type = "character", newName = "HIER_ROLLUP_CD"),
list(index = 3, type = "numeric", newName = "CUR_MO_CT"),
list(index = 4, type = "numeric", newName = "CUR_MO_AM"),
list(index = 5, type = "character", newName = "AD_LINE_1_TX"),
list(index = 6, type = "character", newName = "AD_LINE_2_TX"),
list(index = 7, type = "character", newName = "SUBMIT_DT"),
list(index = 8, type = "character", newName = "UPDT_TS"))
input_source <- RxTextData(file = input_file,
colInfo = my_colInfo,
delimiter = ",",
quotedDelimiters = TRUE,
useFastRead = TRUE)
rxImport(inData = input_source,
outFile = output_file,
overwrite = TRUE, rowsPerRead = 100000,
transforms = list(SUBMIT_DT = as.Date(SUBMIT_DT, format="%d%b%Y")))