Question

我想阅读具有以下结构的.txt文件：

ID      Chr   Allele:Effect ...
Q1        1     1:-0.133302   2: 0.007090
Q2        1     1:-0.050089   2: 0.021212
Q3        1     1: 0.045517   2:-0.038001

问题在于

字段分隔符是可变数量的空格，
我需要摆脱第二和第三列中的前导数字。

最后，结果应该如下：

qtl_id    chr   eff_1         eff_2
Q1        1     -0.133302     0.007090
Q2        1     -0.050089     0.021212
Q3        1      0.045517     -0.038001

修改

head(read.table(file = fpath, sep = "", header = TRUE))收益

ID         Chr Allele.Effect         ...
Q1  1 1:-0.133302            2:    0.007090
Q2  1 1:-0.050089            2:    0.021212
Q3  1          1:      0.045517 2:-0.038001
Q4  1          1:      0.018582 2:-0.041846
Q5  1 1:-0.146560            2:    0.005473
Q6  1 1:-0.048240            2:    0.069418

Answer 1

将read.table与sep =“”一起使用。它旨在处理这种情况。在以下示例中，我将文件保存为“qt.csv”并以此方式读取。工作得很好。

> read.table("qt.csv",sep="",header=T)
1 qtl_id chr     eff_1     eff_2
2     Q1   1 -0.133302  0.007090
3     Q2   1 -0.050089  0.021212
4     Q3   1  0.045517 -0.038001

<强>更新

为了回答新请求（据我所知），我现在读了另一个文件，并删除了那些讨厌的字符。

f <- readLines("allele2.csv")
f <- gsub("(\\d)\\: ","\\1",f)   # get rid of the spaces after the colon

df <- read.table(textConnection(f),sep="",header=T)

df[[2]] <- as.numeric(gsub("^\\d\\:","",df[[2]]))
df[[3]] <- as.numeric(gsub("^\\d\\:","",df[[3]]))
df[[4]] <- as.numeric(gsub("^\\d\\:","",df[[4]]))
df

产生

  ID Chr Allele.Effect       ...
1 Q1   1     -0.133302 20.007090
2 Q2   1     -0.050089 20.021212
3 Q3   1     10.045517 -0.038001

Answer 2

假设数据具有您提供的常规结构，那么将其作为固定列读取应该可行。在这里，我建议您首先通过保存没有标题的文件来准备文件，然后使用您的文件名将textConnection()参数替换为read.fwf()以下。

请注意，通过为您的Allele：Effect列数设置neffects值，这将与您在文件中的数量相同（在示例中为2）。

myfile <- 
"Q1        1     1:-0.133302   2: 0.007090
Q2        1     1:-0.050089   2: 0.021212
Q3        1     1: 0.045517   2:-0.038001"

# makes the solution general enough for any number of effects columns
neffects <- 2  # in example, two

# set up column names
cnames <- c("ID", "Chr", as.vector(outer(c("junk", "Effect"), 1:neffects, paste, sep = ".")))

# read the data as a fixed format
mydata <- read.fwf(textConnection(myfile), c(2, 11, rep(c(5, 9), neffects)), 
                   col.names = cnames,
                   colClasses = c("character", "integer", rep(c("character", "numeric"), neffects)),
                   stringsAsFactors = FALSE)
# get rid of unwanted columns
mydata <- mydata[, -grep("^junk", colnames(mydata))]

mydata
##   ID Chr  Effect.1  Effect.2
## 1 Q1   1 -0.133302  0.007090
## 2 Q2   1 -0.050089  0.021212
## 3 Q3   1  0.045517 -0.038001

从具有可变数量的空格的文本文件中读入表格作为分隔符

2 个答案: