Question

我有一个由单行文本组成的大型数据文件。格式类似于

Cat    14  Dog    15  Horse  16

我最终想把它变成data.frame（所以在上面的例子中，我有两个变量Animal和Number）。每个“行”中的字符数是固定的。

有什么建议吗？

编辑：感谢所有建议。他们完全像我问的那样解决了这个问题。不幸的是，在运行它后我才知道我丢失了数据。但是，字符数仍然是固定的。然后该示例变为

Cat    14         15  Horse  16

每行包含11个字符（包括空格），动物是前7个，数字是下4个。

此修订已作为新问题发布：Importing one long line of data with spaces into R。

Answer 1

此解决方案充分利用scan()的{{1}}参数，对我来说似乎比任何其他参数更简单：

what

Answer 2

这是使用各种工具/黑客的一种解决方案，具体来说：

strplit分割空格字符（\\s）
unlist将strsplit返回的列表强制转换为向量
matrix将矢量转换为合适的形状
data.frame以允许不同模式的列
as.character和as.numeric将Count列从因子

以下是所有内容：

txt <- "Cat 14 Dog 15 Horse 16"

out <- data.frame(matrix(unlist(strsplit(txt, "\\s")), ncol = 2, byrow = TRUE, dimnames = list(NULL, c("Animal", "Count"))))
out$Count <- as.numeric(as.character(out$Count))
str(out)

'data.frame':   3 obs. of  2 variables:
 $ Animal: Factor w/ 3 levels "Cat","Dog","Horse": 1 2 3
 $ Count : num  14 15 16

Answer 3

方法1 :(用seq（）从长向量中提取

> inp <- scan(textConnection("Cat 14 Dog 15 Horse 16"), what="character")
Read 6 items
> data.frame(animal = inp[seq(1,length(inp), by=2)], 
             numbers =as.numeric(inp[seq(2,length(inp), by=2)]))
  animal numbers
1    Cat      14
2    Dog      15
3  Horse      16

方法2 :(使用＆＃34;什么＆＃34;参数扫描到更大的效果）

> inp <- data.frame(scan(textConnection("Cat 14 Dog 15 Horse 16"), 
                     what=list("character", "numeric")))
Read 3 records
> names(inp) <- c("animals", "numbers")
> inp
  animals numbers
1     Cat      14
2     Dog      15
3   Horse      16

这是方法2的一个改进:(担心scan（）结果中可能存在很长的列名，所以我再次阅读了帮助页面，并将名称添加到了什么参数值：

inp <- data.frame(scan(textConnection("Cat 14 Dog 15 Horse 16"), 
                        what=list( animals="character", 
                                   numbers="numeric")))
Read 3 records
> inp
  animals numbers
1     Cat      14
2     Dog      15
3   Horse      16

Answer 4

一种方式：

# read the line
r <- read.csv("exa.Rda",sep=" ", head=F)
# every odd number index is an animal
animals <- r[,(1:ncol(r)-1)%%2==0]
# every even number index is a number
numbers <- r[,(1:ncol(r))%%2==0]
# flipping the animal row into a column
animals <- t(animals)
# flipping the number row into a column
numbers <- t(numbers)
# putting the data together
mydata <- data.frame(animals, numbers)

Answer 5

这是另一种方法

string <- readLines(textConnection(x))
string <- gsub("(\\d+)", "\\1\n", string, perl = TRUE)
dat    <- read.table(text = string, sep = "")

Answer 6

假设空格是分隔符，您可以使用以下机制：

使用scan阅读文件
将结果转换为matrix，然后转换为data.frame

代码：

x <- scan(file=textConnection("
Cat 14 Dog 15 Horse 16
"), what="character")

xx <- as.data.frame(matrix(x, ncol=2, byrow=TRUE))
names(xx) <- c("Animal", "Number")
xx$Number <- as.numeric(xx$Number)

结果：

xx

  Animal Number
1    Cat      1
2    Dog      2
3  Horse      3

将一长行数据导入R

6 个答案: