Question

我是R的初学者，我有一个像这样的大型txt文件：

1:
123,3,2002-09-06
456,2,2005-08-13
789,4,2001-09-20
2:
123,5,2003-05-08
321,1,2004-06-15
432,3,2001-09-11

＆＃39;＆＃39;＆＃39;是itemID，以下行是UserID，Quantity和Date

我想将它读入data.frame，如下所示：

itemID UserID Quantity Date
  1      123   3      2002-09-06
  1      456   2      2005-08-13  
  1      789   4      2001-09-20
  2      123   5      2003-05-08
  2      321   1      2004-06-15
  2      432   3      2001-09-11

使用read.csv可以实现吗？或者如何按条件阅读此文件？

任何帮助将不胜感激。

Answer 1

read.table()无法轻松阅读此内容。 R期望大多数数据数据都是干净的矩形。

您可以将数据作为一堆行读取，将这些行操作为更常规的格式，然后使用read.table对其进行解析。例如

# Read your data file
# xx <- readLines("mydatafile.txt")
# for the sake of a complete example
xx <- scan(text="1:
123,3,2002-09-06
456,2,2005-08-13
789,4,2001-09-20
2:
123,5,2003-05-08
321,1,2004-06-15
432,3,2001-09-11", what=character())

这将读取的行只是字符串。然后，您可以拆分成组并将项目ID作为另一个值附加到每一行

item_group <- cumsum(grepl("\\d+:", xx))
clean_rows <- unlist(lapply(split(xx, item_group), function(x) {
    item_id = gsub(":$",",", x[1])
    paste0(item_id, x[-1])
}))

然后您可以将数据解析为data.frame

read.table(text=clean_rows, sep=",", col.names=c("itemID","UserID","Quantity","Date"))

Answer 2

这是一个解决方案。它是相当手动的，在这个例子中解压缩很多......

separator_pattern <- "^(\\d+):\\s*$"
block_text <- out <- NULL
for(line in readLines(file("~/temp/example.txt"))){
    if(grepl(separator_pattern,line)){
        if(!is.null(block_text)){
            txt <- paste(c(paste0("column",1:3,collapse = ", "), block_text), collapse="\n")
            tmp <- cbind("block" = block_no, read.csv(textConnection(txt)))
            out <- rbind(out,tmp)
        }
        block_no <- as.numeric(gsub(separator_pattern,"\\1",line))
        print(block_no)
        block_text <- character(0)
    }else{
        block_text <- c(block_text,line)
    }
}
txt <- paste(c(paste0("column",1:3,collapse = ", "), block_text), collapse="\n")
tmp <- cbind("block" = block_no, read.csv(textConnection(txt)))
out <- rbind(out,tmp)

显然，这假设您的文件位于path.expand("~/temp/example.txt")

R如何根据条件读取文本文件

2 个答案: