尝试对数据帧执行情绪分析但遇到一些内存问题,因此希望将其分解为块。我有一个大约100K行的数据帧,并希望一次分成10K行。任何关于以编程方式执行此操作的简单方法的想法。这就是我到目前为止所拥有的:
#grabbing product review data
product_reviews <- dbGetQuery(conn,"select * from product_reviews";)
for (i in 1:nrow(product_reviews)) {
# running sentiment algorithm on data, ADDING DATASET TO UPLOAD
emo <- sentiment(product_reviews$REVIEW_TITLE)
sql <- "select element_id,
avg(sentiment) as avg_sentiment,max(sentiment) as max_sentiment,
min(sentiment) as min_sentiment
from emo group by 1"
emo_avg <- sqldf(sql)
class_emo <- classify_emotion(product_reviews$REVIEW_TITLE, algorithm="bayes", prior=1.0)
new <- cbind(product_reviews,
emo_avg$avg_sentiment,
emo_avg$max_sentiment,
emo_avg$min_sentiment,
emo_avg$emotion)
}
基本上,对于for语句,我不是循环遍历product_reviews中的所有行,而是如何先将第1行循环到10000然后再循环到10001到20000,依此类推。
谢谢!
答案 0 :(得分:1)
如果您遇到内存问题,我不知道重组您的循环是否会对任何事情有所帮助 - 可能需要读取,处理相同数量的信息,并存储,无论你如何切片。
话虽如此,有两种方法可以立即浮现在脑海中:
for(i in 1:10000) {
stuff
}
other stuff to deal with memory issues?
for(i in 10001:20000) {
stuff again
}
other stuff to deal with memory issues?
...
ad nauseum ad infinitum
或稍加编程:
for(j in 1:ceiling(nrow(product_reviews)/10000)) {
for(i in (10000*(j-1)+1):(min(10000*j, nrow(product_reviews)))) {
stuff
}
probably some other stuff to deal with memory issues?
}
所以......这可能会赢得战斗,但也许不是战争。祝你好运!