Question

我有一个非常大的文本文件，由于其庞大的大小，我无法在R中使用read.table()阅读。我知道使用readLines()函数可以指定要导入的行数，但是我需要在for循环中导入一行，并将其保存在新文件中或存储在vector /中列表/无论...

那么，python中的内容将是：

myfile=open("myfile.txt",mode="r")
for line in myfile:
    line=line.strip()
    line=line.split("\t")
    print line

这可能与R？

有关

Answer 1

尝试scan()。使用skip，您可以跳过已读取的行，并使用nlines指定要阅读的行数。然后你可以遍历文件。

> large <- 10000
> m <- matrix(sample(c(0,1),3*7,replace=TRUE), ncol=3)
> write.table(m, "test.txt")

> for(i in 0:large) {
+     l <- scan("test.txt", what = character(), skip = i, nlines = 1)
+     if(length(l) == 0) break
+     print (l)
+ }

Read 3 items
[1] "V1" "V2" "V3"
Read 4 items
[1] "1" "0" "1" "0"
Read 4 items
[1] "2" "0" "0" "0"
Read 4 items
[1] "3" "0" "0" "0"
Read 4 items
[1] "4" "0" "1" "1"
Read 4 items
[1] "5" "1" "1" "1"
Read 4 items
[1] "6" "1" "0" "1"
Read 4 items
[1] "7" "0" "0" "1"
Read 0 items

该代码用于说明如何应用scan()以及如何知道何时必须停止阅读。

Answer 2

虽然Яaffael的答案已经足够，但这是包iterators的典型用例。

使用此程序包，您可以逐行遍历文件，而无需将所有数据真正加载到内存中。为了展示一个例子，我将用这种方法破解Airlines数据。获取1988并遵循以下代码：

> install.packages('iterators')
> library(iterators)
> con <- bzfile('1988.csv.bz2', 'r')

好的，现在您已连接到您的文件了。让我们创建一个迭代器：

> it <- ireadLines(con, n=1) ## read just one line from the connection (n=1)

只是为了测试：

> nextElem(it)

你会看到类似的东西：

1“1988,1,9,6,1348,1331,1458,1435，PI，942，NA，70,64，NA，23,17，SYR，BWI，273，NA，NA ，0，NA，0，NA，NA，NA，NA，NA“

> nextElem(it)

您将看到下一行，依此类推。

如果您想逐行阅读，直到文件结尾，您可以使用

> tryCatch(expr=nextElem(it), error=function(e) return(FALSE))

例如，

。当文件结束时，返回逻辑FALSE。

在R中只从一个大文件导入一行

2 个答案: