我需要读取csv文件并根据文件类型处理文件内容,无论是逗号分隔文件还是制表符分隔文件。 我使用以下代码,但效率很低,因为如果输入文件是逗号分隔文件,我需要读取文件两次。 我使用的代码如下:
readFile <- function(fileName){
portData <- read.csv(fileName,sep="\t")
if(length(portData) == 1){
print("comma separated file")
executeCommaSepFile(fileName)
}
else{
print("tab separated file")
#code to process the tab separated file
}
}
executeCommaSepFile <- function(fileName){
csvData <- read.csv(file=fileName, colClasses=c(NA, NA,"NULL",NA,"NULL",NA,"NULL","NULL","NULL"))
#code to process the comma separated file
}
是否可以在不读取文件的全部内容的情况下预测文件类型?或者,如果我通过portData
而不是fileName
,我会以{0}格式获取executeCommaSepFile()
内的数据:
RUS1000.01.29.1999.21st.Centy.Ins.Group.TW.Z.90130N10.72096.1527.534.0.01.21.188
1 RUS1000,01/29/1999,3com Corp,COMS,88553510,358764,16861.908,0.16,47.000
2 RUS1000,01/29/1999,3m Co,MMM,88579Y10,401346,31154.482,0.29,77.625
3 RUS1000,01/29/1999,A D C Telecommunicat,ADCT,00088630,135114,5379.226,0.05,39.813
4 RUS1000,01/29/1999,Abbott Labs,ABT,00282410,1517621,70474.523,0.66,46.438
这可以转换为read.csv(file=fileName, colClasses=c(NA, NA,"NULL",NA,"NULL",NA,"NULL","NULL","NULL")
)的格式吗?
即,采用以下格式:
RUS1000 X01.29.1999 TW.Z X72096
1 RUS1000 01/29/1999 COMS 358764
2 RUS1000 01/29/1999 MMM 401346
3 RUS1000 01/29/1999 ADCT 135114
4 RUS1000 01/29/1999 ABT 1517621
答案 0 :(得分:2)
portData <- read.csv(fileName,sep="\t")
if(length(portData) == 1) {
print("comma separated file")
dat <- read.csv(textConnection(portData))
executeCommaSepFile(dat) # pass the data frame, not the filename
}
else {
print("tab separated file")
#code to process the tab separated file
}
答案 1 :(得分:1)
如果留在基地R,你至少有两个选择。
读入文件的一小部分(nrows
参数read.table
和朋友):
portData <- read.csv(fileName,sep="\t", nrows=1)
if(length(portData) == 1) {
print("comma separated file")
executeCommaSepFile(fileName)
}
else {
print("tab separated file")
executeTabSepFile(fileName) # run read.table in here
}
读入整个文件,如果它不起作用,请使用textConnection
以避免返回磁盘(效率不高,但可以正常工作):
portData <- read.csv(fileName,sep="\t")
if(length(portData) == 1) {
print("comma separated file")
dat <- read.csv(textConnection(portData))
executeCommaSepFile(dat) # pass the data frame, not the filename
}
else {
print("tab separated file")
#code to process the tab separated file
}