Question

I have a txt file like:

"cd_solicitud""nu_cuit""cd_provincia""tx_provincia"
"9531""203128827"18"Salta"
"9541""272477419"9"Entre Ríos"
"9571""273065780"2"Buenos Aires"
"6331""233703594"7"Córdoba"
"6351""272442465"5"Chaco"

I am trying to read it with:

prov_nos<-read.table("C:/.../prov_demo.txt",
                 header=T, quote = "\"")

But I get the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 doesn't have 4 elements

Answer 1

As I sketched out in my comment, some variation on this:

l <- readLines("~/Desktop/scratch/no_delim.txt")
foo <- function(line){
    line <- strsplit(line,"\"")[[1]]
    line <- line[nchar(line) > 0]
    line
}
l <- lapply(l,foo)

> setNames(as.data.frame(do.call(rbind,l[-1])),l[[1]])
  cd_solicitud   nu_cuit cd_provincia tx_provincia
1         9531 203128827           18        Salta
2         9541 272477419            9   Entre Ríos
3         9571 273065780            2 Buenos Aires
4         6331 233703594            7      Córdoba
5         6351 272442465            5        Chaco

I say "some variation" because if there are other odd characters, odd quoting or other gotchas in your file you may need to adjust the splitting and cleanup to handle those.

Answer 2

You can hack it together if you read it in with readLines and then use strsplit to separate the elements of each row. It's not pretty, but then neither is the data's format:

the_text <- '"cd_solicitud""nu_cuit""cd_provincia""tx_provincia"
             "9531""203128827"18"Salta"
             "9541""272477419"9"Entre Ríos"
             "9571""273065780"2"Buenos Aires"
             "6331""233703594"7"Córdoba"
             "6351""272442465"5"Chaco"'
the_text <- readLines(textConnection(the_text))
df <- data.frame(do.call(rbind, strsplit(the_text[-1], '"+')))
names(df) <- strsplit(the_text[1], '"+')[[1]]
df[,1] <- NULL
df
#    cd_solicitud   nu_cuit cd_provincia tx_provincia
# 1          9531 203128827           18        Salta
# 2          9541 272477419            9   Entre Ríos
# 3          9571 273065780            2 Buenos Aires
# 4          6331 233703594            7      Córdoba
# 5          6351 272442465            5        Chaco

Read Table: quotes in R

2 个答案: