Question

 Lake Elsinore  9.7 F W 60.2 131 1 1 0 2310.1
 Lake Elsinore  10.4 F W 53.9 67 0 0 0 1815.9
 Lake Elsinore  10.1 M W 54.3 96 1 1 1 1872.9
 Lake Elsinore  9.6 M W 55.1 72 1 . 1 1980.4

所以这里有十个变量V1-V10。如何将其读取到R.您会看到第一个变量实际上是由空格分隔的。所以我无法阅读＆＃34;按空间分隔＆＃34;。有人可以让我找到一种方法，我可以轻松地导入这些数据。非常感谢你！

Answer 1

以下是两种方法：

1）可以使用gsubfn包中的read.pattern完成。与模式的带括号的部分的匹配作为单独的字段读入：

library(gsubfn)

pattern <- "^(.*) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+)" 
read.pattern("myfile.dat", pattern, na.strings = ".")

，并提供：

                 V1   V2 V3 V4   V5  V6 V7 V8 V9    V10
1     Lake Elsinore  9.7  F  W 60.2 131  1  1  0 2310.1
2     Lake Elsinore 10.4  F  W 53.9  67  0  0  0 1815.9
3     Lake Elsinore 10.1  M  W 54.3  96  1  1  1 1872.9
4     Lake Elsinore  9.6  M  W 55.1  72  1 NA  1 1980.4

2）按原样读取行，用一些字符替换每行的第一个空格（这里我们使用下划线），现在使用read.table重新读取它，然后用空格替换下划线：

L <- readLines("myfile.dat")
L <- sub(" ", "_", L)
DF <- read.table(text = L, na.strings = ".")
DF[[1]] <- sub("_", " ", DF[[1]])

给出相同的答案。

Answer 2

它有点笨重，但我通常只是原始读取它并从那里解析数据。你可以这样做：

# First, read in all columns space separated
df <- read.table(FILE, header = F, sep = " ")

# Create a new column (V12) that's a concatenation of V1, V2
within(df, V12 <- paste(V1, V2, sep=' '))

# And then drop the unwanted columns
df <- df[,2:11]

请记住，你有11列原始读取它，这就是为什么我创建了第12列。

将混乱的数据导入r

2 个答案: