第1步。
我以这种方式将数据从Mongo加载到RDD:
function rmdir_recursive($dir) {
foreach(scandir($dir) as $file) {
if ('.' === $file || '..' === $file) continue;
if (is_dir("$dir/$file")) rmdir_recursive("$dir/$file");
else unlink("$dir/$file");
}
rmdir($dir);
}
if($_FILES["zip_file"]["name"]) {
$filename = $_FILES["zip_file"]["name"];
$source = $_FILES["zip_file"]["tmp_name"];
$type = $_FILES["zip_file"]["type"];
$name = explode(".", $filename);
$accepted_types = array('application/zip', 'application/x-zip-compressed', 'multipart/x-zip', 'application/x-compressed');
foreach($accepted_types as $mime_type) {
if($mime_type == $type) {
$okay = true;
break;
}
}
$continue = strtolower($name[1]) == 'zip' ? true : false;
if(!$continue) {
$message = "The file you are trying to upload is not a .zip file. Please try again.";
}
/* PHP current path */
$path = '../plugins/'; // absolute path to the directory where zipper.php is in
$filenoext = basename ($filename, '.zip'); // absolute path to the directory where zipper.php is in (lowercase)
$filenoext = basename ($filenoext, '.ZIP'); // absolute path to the directory where zipper.php is in (when uppercase)
$targetdir = $path . $filenoext; // target directory
$targetzip = $path . $filename; // target zip file
/* create directory if not exists', otherwise overwrite */
/* target directory is same as filename without extension */
if (is_dir($targetdir)) rmdir_recursive ( $targetdir);
mkdir($targetdir, 0777);
/* here it is really happening */
if(move_uploaded_file($source, $targetzip)) {
$zip = new ZipArchive();
$x = $zip->open($targetzip); // open the zip file to extract
if ($x === true) {
$zip->extractTo($targetdir); // place in the directory with same name
$zip->close();
unlink($targetzip);
}
$message = "Your .zip file was uploaded and unpacked.";
} else {
$message = "There was a problem with the upload. Please try again.";
}
}
var buff = new ListBuffer[item]
每次都在工作。
第2步。
我将此数据保存到persistance文件(文件夹):
data1 = sc.makeRDD(buff).setName(name).persist(MEMORY_ONLY)
进一步说,我可以从文件而不是Mongo中读取数据
data1.saveAsObjectFile(fileName)
每次都在工作。
问题。
为什么data2 = sc.objectFile(fileName).persist(MEMORY_ONLY).setName(name)
比data1
???
例如,
data2
记忆力就足够了。
欢迎任何想法。
答案 0 :(得分:1)
您没有提供reproducible example并且没有足够的有关您的配置的详细信息来为您提供明确的答案,但很可能所有内容都可以简化为执行中的简单差异:
1.- Unplug LAN cable, connect to my WiFi network
2.- Retry to sign in to Genymotion
必须:
sc.makeRDD
以创建Driver
buff
中的项目并转移给相应的工作人员 buff
必须:
标有斜体的阶段是并行执行的。否则,至少部分地在驱动程序上执行处理,并且不完全利用该集群。
正如您所看到的那样,sc.objectFile
/ makeRDD
效率显着降低,通过parallelize
传递数据很容易成为瓶颈。这些方法主要用于测试,并行化小型支持集合或类似范围的对象,以便不处理重量级数据。
至少有两个MongoDB连接器可供使用: