Question

我有一个包含44列的.txt文件。前两列和后两列不包含标题。因此，当我将使用read.table然后它将给出一个错误行2没有44个元素。为了解决这个错误，我使用了fill = TRUE。但是使用它会解决错误，但前两列将在带有标题A和R的列中移位。请提供一些有关如何在r中读取此类.txt文件的帮助。

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts


            A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
    1 M    -1  -2  -2  -3  -2  -1  -2  -3  -2   1   2  -1   6   0  -3  -2  -1  -2  -1   1    0   0   0   0   0   0   0   0   0   0   0   0 100   0   0   0   0   0   0   0  0.45 0.03
    2 K    -1   2   0  -1  -3   1   1  -2  -1  -3  -3   5  -1  -3  -1   0  -1  -3  -2  -2    0   0   0   0   0   0   0   0   0   0   0 100   0   0   0   0   0   0   0   0  0.57 0.02
    3 K    -1   2   0  -1  -3   1   1  -2  -1  -3  -3   5  -1  -3  -1   0  -1  -3  -2  -2    0   0   0   0   0   0   0   0   0   0   0 100   0   0   0   0   0   0   0   0  0.57 0.02
    4 R    -1   4   3   0  -2   0   0  -1   0  -3  -3   1  -2  -3  -2   2   0  -3  -2  -2    0  42  31   0   0   0   0   0   0   0   0   0   0   0   0  27   0   0   0   0  0.41 0.01
    5 I    -1  -2  -2  -2  -1  -2  -2  -3  -3   3   2  -2   1   0  -2  -1   2  -2  -1   1    0   0   0   0   0   0   0   0   0  42  27   0   0   0   0   0  31   0   0   0  0.23 0.01
    6 L    -2  -3  -3  -3  -2  -3  -3  -4  -2   2   3  -3   1   3  -3  -2  -1  -1   1   1    0   0   0   0   0   0   0   0   0  27  42   0   0  31   0   0   0   0   0   0  0.35 0.01
    7 S     0  -1   0  -1  -1  -1  -1  -1  -1   1  -1  -1   0  -2  -1   3   1  -3  -2   0    0   0   0   0   0   0   0   0   0  31   0   0   0   0   0  69   0   0   0   0  0.23 0.01
    8 A     4  -1  -2  -2   0  -1  -1   0  -2  -1  -2  -1  -1  -2  -1   1   0  -3  -2   0  100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  0.44 0.02
    9 V    -1  -3  -3  -3  -1  -2  -3  -3  -3   3   1  -2   1  -1  -3  -2   0  -3  -1   4    0   0   0   0   0   0   0   0   0  31   0   0   0   0   0   0   0   0   0  69  0.37 0.01

Answer 1

它基本上归结为摆弄用于命名列和行的向量。如果您在阅读之前不想更改文件，我想不出另一种方法。

您没有指定文本文件的前两列是否是单独的变量，或者它们是否用作行名。因此，我至少得出三种不同的解决方案。但每个解决方案的前三行是相同的：

# Reading the data without the header.
data <- read.table("data.txt", header = F, skip = 4)

# Reading only the header. Pay attention to the brackets at the end!
col.names <- read.table("data.txt", header = F, skip = 3, fill = T, stringsAsFactors = F)[1, ]
col.names <- as.character(col.names) # let's strip of the names for aesthetic reasons

从这里选择你想要的结果。

a）前两列作为单独的变量：

a.col.names <- c("V1", "V2", col.names[1:42])

a.data <- data
names(a.data) <- a.col.names

b）前两列为行名：

b.data <- data[3:NCOL(data)]

colnames(b.data) <- col.names[1:42]

b.row.names <- paste(data[[1]], data[[2]], sep = "")
rownames(b.data) <- b.row.names

c）第一列为行名，第二列为单独变量：

c.data <- data[2:NCOL(data)]
c.col.names <- c("V1", col.names[1:42])
names(c.data) <- c.col.names

在r中读取.txt文件时出错

1 个答案: