我有一些文本文件(csv格式),有一些缺少的文本限定符,如第二行,下面的五列(AMM):
"A",4,"","","HIGH STREET, 22","","","L6","3AA"
"B",2957136105,98,"M12ASE7569",AMM",1,,,"F",,20010514,"CR"
"C","T","UNKNOWN","",19000101
"D",4
我设法通过循环显示此代码的列来找出不一致的行:(只需将上面的内容保存在txt中)
library(plyr)
a <- readLines(path) #
a <- rbind.fill(lapply(a, function(x) read.table(text=x, sep=",", as.is=T, quote="")))
> which(sapply(gregexpr("\"", a[,5]), length)==1 & grepl("\"", a[,5]))
[1] 1 2
但是在我的文件中,字段内有逗号(由于地址),因此我也会误报...
你们有些人遇到过这样的问题吗?如果是这样,你有什么想法?