我是R的新手并尝试从一个文件生成大量图表,并在不同数据集之间添加标题。 我有一个制表符分隔的纯文本文件,格式如下:
Header: Boston city data
Month Data1 Data2 Data3
1 1.5 9.1342 8.1231
2 12.3 12.31 1.129
3 (etc...)
Header: Chicago city data
Month Data1 Data2 Data3
1 1.5 9.1342 8.1231
2 12.3 12.31 1.129
...
我想为每个城市创建一个月份与数据1,月份与数据2以及月份与数据2的图表。
我知道在python中,我可以迭代每一行,如果行以'Header'开头,则执行不同的操作,然后以某种方式处理数字。我想简单地这样做:
for (data block starting with header) in inf:
data = read.delim()
barplot(data, main=header, ylab="Data1", xlab="Month")
# repeat for Data2, Data3
但是我不确定如何实际迭代文件,或者我应该将我的文件按城市分成许多小文件,然后运行一个小文件列表来阅读。
答案 0 :(得分:4)
您可以使用gsub
,grep
和strsplit
的组合:
## get city name
nameSet <- function(x) {
return(gsub(pattern="Header: (.*) city data", replacement="\\1", x=x))
}
## extract monthly numbers
singleSet <- function(x) {
l <- lapply(x, function(y) {
## split single line by spaces
s <- strsplit(y, "[[:space:]]+")
## turn characters into doubles
return(as.double(s[[1]]))
})
## turn list into a matrix
m <- do.call(rbind, l)
return(m)
}
## read file
con <- file("data.txt", "r")
lines <- readLines(con)
close(con)
## determine header lines and calculate begin/end lines for each dataset
headerLines <- grep(pattern="^Header", x=lines)
beginLines <- headerLines+2
endLines <- c(headerLines[-1]-1, length(lines))
## layout plotting region
par(mfrow=c(length(beginLines), 3))
## loop through all datasets
for (i in seq(along=headerLines)) {
city <- nameSet(lines[headerLines[i]])
data <- singleSet(lines[beginLines[i]:endLines[i]])
for (j in 2:ncol(data)) {
barplot(data[,j], main=city, xlab="Month", ylab=paste("Data", j-1))
}
}
par(mfrow=c(1, 1))
答案 1 :(得分:2)
以下是我的评论中提到的稍微修改版本的功能。
read.funkyfile = function(funkyfile, expression, ...) {
temp = readLines(funkyfile)
temp.loc = grep(expression, temp)
temp.loc = c(temp.loc, length(temp)+1)
temp.nam = gsub("[[:punct:]][[:space:]]", "",
grep(expression, temp, value=TRUE))
temp.nam = gsub(expression, "", temp.nam)
temp.out = vector("list")
for (i in 1:length(temp.nam)) {
temp.out[[i]] = read.table(textConnection(
temp[seq(from = temp.loc[i]+1,
to = temp.loc[i+1]-1)]),
...)
names(temp.out)[i] = temp.nam[i]
}
temp.out
}
假设您的文件名为“File.txt”,请加载该函数并读取这样的数据。您可以向read.table
添加您需要的任何参数:
temp = read.funkyfile("File.txt", "Header", header=TRUE, sep="\t")
现在,情节:
# to plot everything on one page (used for this example), uncomment the next line
# par(mfcol = c(length(temp), 1))
lapply(names(temp), function(x) barplot(as.matrix(temp[[x]][-1]),
beside=TRUE, main=x,
legend=TRUE))
# dev.off() or par(mfcol = c(1, 1)) if par was modified
以下是par(mfcol = c(length(temp), 1))
的小样本数据: