从一个文本文件在R中创建多个图

时间:2012-07-18 18:54:06

标签: r plot text-files batch-processing

我是R的新手并尝试从一个文件生成大量图表,并在不同数据集之间添加标题。 我有一个制表符分隔的纯文本文件,格式如下:

Header: Boston city data
Month    Data1    Data2    Data3
1        1.5      9.1342   8.1231
2        12.3     12.31    1.129
3        (etc...)  

Header: Chicago city data
Month    Data1    Data2    Data3
1        1.5      9.1342   8.1231
2        12.3     12.31    1.129
...

我想为每个城市创建一个月份与数据1,月份与数据2以及月份与数据2的图表。

我知道在python中,我可以迭代每一行,如果行以'Header'开头,则执行不同的操作,然后以某种方式处理数字。我想简单地这样做:

for (data block starting with header) in inf:
    data = read.delim()
    barplot(data, main=header, ylab="Data1", xlab="Month")
    # repeat for Data2, Data3

但是我不确定如何实际迭代文件,或者我应该将我的文件按城市分成许多小文件,然后运行一个小文件列表来阅读。

2 个答案:

答案 0 :(得分:4)

您可以使用gsubgrepstrsplit的组合:

## get city name
nameSet <- function(x) {
    return(gsub(pattern="Header: (.*) city data", replacement="\\1", x=x))
}

## extract monthly numbers
singleSet <- function(x) {
    l <- lapply(x, function(y) {
        ## split single line by spaces
        s <- strsplit(y, "[[:space:]]+")
        ## turn characters into doubles
        return(as.double(s[[1]]))
    })
    ## turn list into a matrix
    m <- do.call(rbind, l)
    return(m) 
}

## read file
con <- file("data.txt", "r")
lines <- readLines(con)
close(con)

## determine header lines and calculate begin/end lines for each dataset
headerLines <- grep(pattern="^Header", x=lines)
beginLines <- headerLines+2
endLines <- c(headerLines[-1]-1, length(lines))

## layout plotting region
par(mfrow=c(length(beginLines), 3))

## loop through all datasets
for (i in seq(along=headerLines)) {
    city <- nameSet(lines[headerLines[i]])
    data <- singleSet(lines[beginLines[i]:endLines[i]])

    for (j in 2:ncol(data)) {
        barplot(data[,j], main=city, xlab="Month", ylab=paste("Data", j-1))
    }
}
par(mfrow=c(1, 1))

barplots

答案 1 :(得分:2)

以下是我的评论中提到的稍微修改版本的功能。

read.funkyfile = function(funkyfile, expression, ...) {
  temp = readLines(funkyfile)
  temp.loc = grep(expression, temp)
  temp.loc = c(temp.loc, length(temp)+1)
  temp.nam = gsub("[[:punct:]][[:space:]]", "", 
                  grep(expression, temp, value=TRUE))
  temp.nam = gsub(expression, "", temp.nam)
  temp.out = vector("list")

  for (i in 1:length(temp.nam)) {
    temp.out[[i]] = read.table(textConnection(
      temp[seq(from = temp.loc[i]+1,
               to = temp.loc[i+1]-1)]),
                             ...)
    names(temp.out)[i] = temp.nam[i]
  }
  temp.out
}

假设您的文件名为“File.txt”,请加载该函数并读取这样的数据。您可以向read.table添加您需要的任何参数:

temp = read.funkyfile("File.txt", "Header", header=TRUE, sep="\t")

现在,情节:

# to plot everything on one page (used for this example), uncomment the next line
# par(mfcol = c(length(temp), 1)) 
lapply(names(temp), function(x) barplot(as.matrix(temp[[x]][-1]), 
                                        beside=TRUE, main=x, 
                                        legend=TRUE))
# dev.off() or par(mfcol = c(1, 1)) if par was modified

以下是par(mfcol = c(length(temp), 1))的小样本数据:

enter image description here