名为myfile.txt的文件设置如下:
Column1
Column2
Column3
...
Column10
Row1
1
2
3
4
Row2
5
6
7
8
...
行最终会变为100并且我在使用read.table命令时遇到问题。我不是一个经验丰富的R用户,所以我只需要解决这个问题并完成它。
我认为col.names看起来像:
read.table("myfile.txt", col.names = 1:10)
但那不起作用
答案 0 :(得分:4)
示例myfile.txt
:
Column1
Column2
Column3
Column4
Row1
1
2
3
4
Row2
5
6
7
8
阅读文件并创建一个矩阵:
lin <- scan("myfile.txt", "") # read lines
lin2 <- grep("Column|Row", lin, value = TRUE, invert = TRUE) # values
matrix(as.numeric(lin2), ncol = sum(grepl("Column", lin)), byrow = TRUE)
# create matrix
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
如果第一行未命名为Column...
但包含实际列名,则可以使用以下方法:
lin <- scan("myfile.txt", "") # read lines
idx <- grep("^Row", lin) # index of lines containing 'Row'
lin2 <- lin[-(c(seq(1, idx[1] - 1), idx))] # actual values
matrix(as.numeric(lin2), nrow = length(idx),
dimnames = list(NULL, lin[seq(1, idx[1] - 1)]), byrow = TRUE)
Column1 Column2 Column3 Column4
[1,] 1 2 3 4
[2,] 5 6 7 8
答案 1 :(得分:3)
里卡多给出了一个提示,这是一种让它发挥作用的方法:
x <- read.table(text="Column1
Column2
Column3
Column10
Row1
1
2
3
4
Row2
5
6
7
8")
现在插入换行符:
(combined <- paste(x[[1]], collapse='\n'))
[1] "Column1\nColumn2\nColumn3\nColumn10\nRow1\n1\n2\n3\n4\nRow2\n5\n6\n7\n8"
拆分行\ d + \ n:
(comb.split <- strsplit(combined, 'Row\\d+\\n'))
[[1]]
[1] "Column1\nColumn2\nColumn3\nColumn10\n" "1\n2\n3\n4\n" "5\n6\n7\n8"
将这些元素拆分为换行符:
(split.list <- strsplit(comb.split[[1]], '\\n'))
[[1]]
[1] "Column1" "Column2" "Column3" "Column10"
[[2]]
[1] "1" "2" "3" "4"
[[3]]
[1] "5" "6" "7" "8"
强制数字(如果适用):
(numeric.list <- lapply(split.list[-1], as.numeric))
[[1]]
[1] 1 2 3 4
[[2]]
[1] 5 6 7 8
创建数据框:
dat <- do.call(rbind, numeric.list)
colnames(dat) <- split.list[[1]]
dat
Column1 Column2 Column3 Column10
[1,] 1 2 3 4
[2,] 5 6 7 8
这里确实丢失了行名。如果您知道它们是什么,可以使用rownames(dat)<- names
添加它们。
答案 2 :(得分:1)
X <- read.table(text=
"Column1
Column2
Column3
Column10
Row1
1
2
3
4
Row2
5
6
7
8
Row99
1
2
3
4
Row100
5
6
7
8", stringsAsFactors=FALSE)
# Some string that does not appear naturally in your data
dummyCollapse <- "\n" # eg: "zQz5Nsddfdfjjj"
## Make sure to properly escape the escape char.
dummyFind <- "\\n"
flat <- paste(unlist(X), collapse=dummyCollapse)
splat <- strsplit(flat, paste0("Row\\d+", dummyFind))
## strsplit returns a list. You likely want just the first element
splat <- splat[[1]]
## weed out colnames
cnms <- splat[[1]] # now, the first element is the coloumn names from the weird data structure
# split them, also on dummyFind
cnms <- strsplit(cnms, dummyFind)
# again, only want the first element
cnms <- cnms[[1]]
## Weed out the rows
rows <- tail(splat, -1)
# split on dummy find
rows <- strsplit(rows, dummyFind)
## NOTE: This time, do NOT take just the first element from strsplit. You want them all
## Combine the data into a matrix
MyData <- do.call(cbind, rows)
## Coerce to data.frame, if you'd like
MyData <- as.data.frame(MyData)
## Add in Column names
colnames(MyData) <- cnms
> MyData
Column1 Column2 Column3 Column10
1 1 5 1 5
2 2 6 2 6
3 3 7 3 7
4 4 8 4 8