如何读取R中不属于格式表的文件?
数据包含某些值的空白数据。空白需要有价值。
“关于”和“名称”是唯一始终存在的值。
例如文本文件如下:
Name
Type
Color
About
Spiderman
Marvel
Red
Swings from webs
Superman
DC
Likes to fly around
Hulk
Marvel
Green
I told you not top make him mad.
Batman
Black
He is a good fighter and detective
Martian Manhunter
DC
He is from Mars
Deadpool
Black Red
Kinda Crazy
第一个条目是标题。 我想把它变成像
这样的数据框Name Type Color About
Spiderman Marvel Red Swings from webs
Superman DC Likes to fly around
Hulk Marvel Green I told you not top make him mad.
Batman Black He is a good fighter and detective
Mar...ter DC He is from Mars
Deadpool Black Red Kinda Crazy
答案 0 :(得分:7)
在多线模式下使用扫描(对于由空行分隔的三个项目的非常规的组):
filename="myPath/myFile.txt"
inp <- scan(filename, , what=as.list(rep("",3) ))
dinp <- as.data.frame(inp, stringsAsFactors=FALSE)
names(dinp) <- dinp[1,] # use first set as the column names
dinp <- dinp[-1,] # then remove from the data
第二次尝试(不同的问题)
dat <- readLines(filename)
# Matrices are column-major order, hence the t(). I suppose I could have used byrow=TRUE.
mydf <- as.data.frame( t(matrix(dat, nrow=5) )[-1,-5] )
names(mydf) <- dat[1:4]
#-----------------------------
> mydf
Name Type Color About
1 Spiderman Marvel Red Swings from webs
2 Superman DC Likes to fly around
3 Hulk Marvel Green I told you not top make him mad.
4 Batman Black He is a good fighter and detective
5 Martian Manhunter DC He is from Mars
6 Deadpool Black Red Kinda Crazy
答案 1 :(得分:0)
您列出的数据应该可以使用R read.table
读取,而无需任何额外的参数。它会自动确定分隔符(在您的情况下为空格)并忽略空行。因此,如果您有一个名为test.txt
的数据文件,其中包含
Name Type Color
Spiderman Marvel Red
Superman DC Blue
Hulk Marvel Green
然后你会做
> read.table('test.txt',header=TRUE)
Name Type Color
1 Spiderman Marvel Red
2 Superman DC Blue
3 Hulk Marvel Green
请注意,read.table
只是scan
函数的包装器,如果您需要在读取数据时更加高兴,可以使用它。见http://stat.ethz.ch/R-manual/R-devel/library/base/html/scan.html