应用程序执行sysDF.write.partitionBy
,并成功写出第一个镶木地板文件。但在那之后,应用程序挂起所有执行死亡,直到一些加班发生。 ACTION代码如下:
import sqlContext.implicits._
val systemRDD = basicLogRDD.map(basicLog => if (basicLog.isInstanceOf[SystemLog]) basicLog.asInstanceOf[SystemLog] else null).filter(_ != null)
val sysDF = systemRDD.toDF()
sysDF.write.partitionBy("appId").parquet(outputPath + "/system/date=" + dateY4M2D2)
val customRDD = basicLogRDD.map(basicLog => if (basicLog.isInstanceOf[CustomLog]) basicLog.asInstanceOf[CustomLog] else null).filter(_ != null)
val customDF = customRDD.toDF()
customDF.write.partitionBy("appId").parquet(outputPath + "/custom/date=" + dateY4M2D2)
val illegalRDD = basicLogRDD.map(basicLog => if (basicLog.isInstanceOf[IllegalLog]) basicLog.asInstanceOf[IllegalLog] else null).filter(_ != null)
val illegalDF = illegalRDD.toDF()
illegalDF.write.partitionBy("appId").parquet(outputPath + "/illegal/date=" + dateY4M2D2)
答案 0 :(得分:0)
首先,地图可以与过滤器结合使用,这应该稍微优化一下查询:
private async void buttondownload_Click(object sender, EventArgs e)
{
try
{
using (FolderBrowserDialog fbd = new FolderBrowserDialog() { Description = "select your path ." })
{
if (fbd.ShowDialog() == DialogResult.OK)
{
var youtube = YouTube.Default;
labelstatus.Text = "Downloading....";
var video = await youtube.GetVideoAsync(textBoxurl.Text);
//setting progress bar...............................??????
File.WriteAllBytes(fbd.SelectedPath + video.FullName, await video.GetBytesAsync());
labelstatus.Text = "Completed!";
}
}
}
首先,最好在多次使用时缓存val rdd = basicLogRDD.cache()
rdd.filter(_.isInstanceOf[SystemLog]).write.partitionBy("appId").parquet(outputPath + "/system/date=" + dateY4M2D2)
rdd.filter(_.isInstanceOf[CustomLog]).write.partitionBy("appId").parquet(outputPath + "/custom/date=" + dateY4M2D2)
rdd.filter(_.isInstanceOf[IllegalLog]).write.partitionBy("appId").parquet(outputPath + "/illegal/date=" + dateY4M2D2)
。 basicLogRDD
运算符将keep the RDD in memory。
其次,不需要将RDD显式转换为DataFrame,因为它是隐含的implicitly converted to a DataFrame,允许使用Parquet存储它(您需要定义.cache()
)。