Question

我有一个包含30行和一列的输入文件。我必须将列拆分为两列，并且必须将其拆分，使得最终输出必须具有两列具有相同位数的列。例如：假设File采用以下格式。

，输出应为

11111  11111
10101  01010
11100  11010
10111  11111

Answer 1

这是一种使用read.fwf（读取固定宽度格式）的方法：

## make a fake file called 'x'
x <- tempfile() 

cat("1111111111
1010101010
1110011010
1011111111", sep = "\n", file = x)

# read just the first line to find out how many characters
# there are in each line. You can use this to determine your widths
Width <- nchar(readLines(x, n = 1)) 

## Use read.fwf
read.fwf(file = x, widths = rep(Width/2, 2), 
         colClasses = "character")
#      V1    V2
# 1 11111 11111
# 2 10101 01010
# 3 11100 11010
# 4 10111 11111

您还可以使用substr：

A <- readLines(x)
cbind(V1 = substr(A, 1, 5), V2 = substr(A, 6, 10))

或者，没有硬编码substr：

的值

apply(matrix(c(1, Width/2, Width/2+1, Width), ncol = 2), 
      2, function(y) substr(readLines(x), y[1], y[2]))

Answer 2

假设您的数据位于data.frame中，问题很简单。包tidyr包含方便的separate函数：

df <- read.table(textConnection("1111111111
1010101010
1110011010
1011111111"))

library(tidyr)
library(stringr)

separate(df,V1,into = c("one","two"), sep = 5)

    one   two
1 11111 11111
2 10101 01010
3 11100 11010
4 10111 11111

如果你并不总是有10个值，你可以将separate包裹在一个小函数中并传入实际长度：

separator <- function(l = 5) separate(df,V1,into = c("one","two"), sep = l)

nstr <- unique(sapply(df$V1,str_length))

stopifnot(length(nstr) == 1) 
separator(nstr %/% 2)

    one   two
1 11111 11111
2 10101 01010
3 11100 11010
4 10111 11111

不使用%/%整数除法。这将确保sep的值始终为整数，但对于奇数，它将表示结果中的宽度不等。

正如Ananda在评论中所说的那样，这比他的（真棒）read.fwf方法更快：

library(microbenchmark)
microbenchmark(read.fwf(file = x, widths = rep(Width/2, 2), 
                        colClasses = "character"),
               separator())

Unit: microseconds
                                                                   expr     min       lq   median      uq
 read.fwf(file = x, widths = rep(Width/2, 2), colClasses = "character") 833.863 872.6145 891.4575 915.930
                                                            separator() 134.959 150.9120 167.2690 185.702
      max neval
 1273.357   100
 2748.900   100

如何将列拆分为R中的两列

2 个答案: