Question

我有一个ampl格式的文本文件数据集，我想将其以m行和n列制表，但此数据分为块，即前 m 行包含前19列，然后同样的 m 行与接下来的19列相同，依此类推，我尝试使用>>> C:\Users\jgoss\eclipse\committers-2019-09\eclipse\eclipse.exe -data C:\SharedData\Projects\Tutorial &中的fread将其导入R中，但是我只能读取第一个块即 m 行，前19列。我举了一个数据例子

Example dataset

Answer 1

有些可以实现自动化，有时还需要进行一些手动检查。您的数据具有一种模式，即数据集开头上方的2 lines格式不正确（包括标头是数字）。其他所有块也是如此。第一列也不是数据集的一部分。您也不需要将其放入数据集中。因此，最简单的方法是在没有标题的情况下读取块，然后删除第一列，进行汇编，然后添加数字标题。可以如下实现。

df1 <- read.table(file = "dataset.txt", skip = 2, nrows = 100, header = FALSE)
# remove the first col
df1$V1 <- NULL

df2 <- read.table(file = "dataset.txt", skip = 104, nrows = 100, header = FALSE)
# remove the first col
df2$V1 <- NULL

df3 <- read.table(file = "dataset.txt", skip = 206, nrows = 100, header = FALSE)
# remove the first col
df3$V1 <- NULL

df4 <- read.table(file = "dataset.txt", skip = 308, nrows = 100, header = FALSE)
# remove the first col
df4$V1 <- NULL

df5 <- read.table(file = "dataset.txt", skip = 410, nrows = 100, header = FALSE)
# remove the first col
df5$V1 <- NULL

df6 <- read.table(file = "dataset.txt", skip = 512, nrows = 100, header = FALSE)
# remove the first col
df6$V1 <- NULL

# -------------------------------------------------------------------------

mydf <- cbind(df1,df2,df3,df4,df5,df6)
colnames(mydf) <- c(seq(1:100))

您可能会在这里看到一个重复的模式，可以构造一个函数或一个循环，但是我暂时不做介绍。

希望有帮助。

导入数据集，其中数据集的列除以块

1 个答案: