Question

我一直在努力让我的测试数据分开。

> FDF <- read.csv.ffdf(file='C:\\Users\\William\\Desktop\\R Data\\TestData0812.txt', header = FALSE, colClasses=c('factor','factor','numeric','numeric','numeric','numeric'), sep=',')
> names(FDF)<- c('Date','Time','Open','High','Low','Close')
> 
> # ID
> FDF2 <-FDF[1:100,]
> FDF2 <- as.ffdf(FDF2)
> a <- nrow(FDF2)
> # Take section of import for testing
> FDF2[1:3,]
        Date  Time   Open   High    Low  Close
1 1987.08.28 12:00 1.6238 1.6240 1.6237 1.6239
2 1987.08.28 12:01 1.6239 1.6240 1.6235 1.6236
3 1987.08.28 12:02 1.6236 1.6239 1.6235 1.6238
> 
> ID <- data.frame(matrix(1:a, nrow = a, ncol=1 ))
> ID <- as.ffdf(ID)
> names(ID) <- c('ID')
> FDF3 <- cbind.ffdf2(ID, FDF2)
> # Create ID column and binds together
> FDF3[1:3,]
  ID       Date  Time   Open   High    Low  Close
1  1 1987.08.28 12:00 1.6238 1.6240 1.6237 1.6239
2  2 1987.08.28 12:01 1.6239 1.6240 1.6235 1.6236
3  3 1987.08.28 12:02 1.6236 1.6239 1.6235 1.6238

我将使用它的文件是一个ffdf对象，因为它是700mb。我想知道如何分割数据集？

我目前的代码是;

T = ffdfdply(FDF3, split(FDF3$ID, rep(1:10,each=10)))

我在论坛和其他方面做了很多这方面的研究和研究。但是，为简单起见，我刚才包含了上面的例子。

操作时，上面的代码给出了以下错误;

ffdfdply中的
错误（FDF3，拆分（FDF3 $ ID，代表（1:10，每个= 10）））：
split需要与x
中的行数相同

我似乎无法理解为什么rep(1:10, each = 10)的分割在> dim(FDF3) [1] 100 7

的数据集中不起作用

即使每次拆分都没有完整的行数，我也希望拆分执行，让我们说：T = ffdfdply(FDF3, split(FDF3$ID, rep(1:10,each=3)))

我已经参与了至少20个小时。

Answer 1

我无法弄清楚ffdfdplyr包的正确用法，我仍然不知道它是否是正确的用法。但是，我已经构建了一个工作，并希望有人发现它有用。我想补充一点，这确实是丑陋的，因此我愿意接受如何简单地提出建议并感谢您的评论。

ffdfEnd <- 5 
# Variable
ffdfrows = nrow(FDF3)
ffdfStart <- 1 
ffdfLoop <- ffdfStart 
ffdfSplitSize <- ffdfEnd
# Creates constants and varaibles

splitNum <- ffdfrows/ffdfEnd
# Calculates the number of split required
ffdf.names <- paste('FFDF', ffdfSplitSize, ffdfLoop:splitNum,sep='.')
# Creates names to be pasted to resulting tables

for (i in ffdfLoop:splitNum) {
        assign(ffdf.names[i], as.ffdf(FDF3[ffdfStart:ffdfEnd,]))
        ffdfStart = (ffdfEnd)
        ffdfEnd = (ffdfEnd + ffdfSplitSize)}
# loops over until requirments are fulfilled`

如何在R中“拆分”我的数据集？

1 个答案: