我有以下代码:
val data = input.map{... }.persist(StorageLevel.MEMORY_ONLY_SER).repartition(2000)
我想知道如果我首先进行重新分配会有什么不同:
val data = input.map{... }.repartition(2000).persist(StorageLevel.MEMORY_ONLY_SER)
调用赔偿和坚持的顺序是否存在差异?谢谢!
答案 0 :(得分:12)
是的,有区别。
在第一种情况下,你会在地图阶段后获得持久的RDD。这意味着每次访问<asp:Content ID="Content2" ContentPlaceHolderID="ContentPlaceHolder1" runat="Server">
<div class="center">
<h2><asp:Label ID="lblSearch" runat="server" Text=""></asp:Label></h2>
</div>
<div id="listview">
<asp:ListView ID="ListView1" runat="server" DataKeyNames="ProductId" DataSourceID="ProductDataSource" GroupItemCount="3">
<EmptyDataTemplate>
Search not found. Please try again.
</EmptyDataTemplate>
<GroupTemplate>
<tr id="itemPlaceholderContainer" runat="server">
<td id="itemPlaceholder" runat="server"></td>
</tr>
</GroupTemplate>
<ItemTemplate>
<td runat="server" style="">
<asp:ImageButton ID="ImageButton1" runat="server" ImageUrl='<%# Eval("ImageUrl") %>' CssClass="productimage" />
<br />
<asp:Label ID="ProductNameLabel" runat="server" Text='<%# Eval("ProductName") %>' />
<br />
<asp:Label ID="PriceLabel" runat="server" Text='<%# String.Format("{0:C}", Eval("Price")) %>' />
</td>
</ItemTemplate>
<LayoutTemplate>
<table runat="server">
<tr runat="server">
<td runat="server">
<table id="groupPlaceholderContainer" runat="server" border="0" style="">
<tr id="groupPlaceholder" runat="server">
</tr>
</table>
</td>
</tr>
<tr runat="server">
<td runat="server" style=""></td>
</tr>
</table>
</LayoutTemplate>
<SelectedItemTemplate>
<td runat="server" style="">
<asp:Label ID="ProductNameLabel" runat="server" Text='<%# Eval("ProductName") %>' />
<br />
<asp:Label ID="PriceLabel" runat="server" Text='<%# String.Format("{0:C}", Eval("Price")) %>' />
<br />
</td>
</SelectedItemTemplate>
</asp:ListView>
</div>
<asp:SqlDataSource ID="ProductDataSource" runat="server" ConnectionString="<%$ ConnectionStrings:ProjectConnectionString1 %>" SelectCommand="SELECT * FROM [Product] WHERE ([ProductName] LIKE '%' + @Search + '%')">
<SelectParameters>
<asp:ControlParameter ControlID="lblSearch" Name="Search" PropertyName="Text" Type="String" />
</SelectParameters>
</asp:SqlDataSource>
</asp:Content>
时都会触发data
。
在第二种情况下,您在重新分区后进行缓存。访问repartition
并且之前已实现时,无需执行其他工作。
要证明我们可以做一个实验:
data
并查看存储信息:
import org.apache.spark.storage.StorageLevel
val data1 = sc.parallelize(1 to 10, 8)
.map(identity)
.persist(StorageLevel.MEMORY_ONLY_SER)
.repartition(2000)
data1.count()
val data2 = sc.parallelize(1 to 10, 8)
.map(identity)
.repartition(2000)
.persist(StorageLevel.MEMORY_ONLY_SER)
data2.count()
如您所见,有两个持久的RDD,一个有2000个分区,另一个有8个。