我有两个RDD,如RDD[(String, String, DataTime, Int, Array[Byte])]
,让我们调用Rdd1
和Rdd2
,我想在Rdd1
和Rdd2
之间进行比较每个元组值和如果不匹配则存储在新的Rdd中。我在Scala上使用Spark。
例如: 考虑Rdd的输出是:
data = ((student1,XII,2016-09-11T00:00:00.000Z,1,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120001),
(student1,XII,2016-09-11T00:00:00.000Z,2,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120002),
(student2,XII,2016-09-12T00:00:00.000Z,2,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120004),
(student3,XII,2016-09-13T00:00:00.000Z,4,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120005))
data2 = ((student1,XII,2016-09-11T00:00:00.000Z,1,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120001),
(student1,XII,2016-09-11T00:00:00.000Z,2,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120002),
(student2,XII,2016-09-12T00:00:00.000Z,2,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120004))
并将分区键视为
case class Student(name: String, class: String, dob: DateTime)
我想检查rdd1
中的rdd2
中的每个条目(以及所有字段值)是否应该存在于 val resultRdd = ((student3,XII,2016-09-13T00:00:00.000Z,4,0x0a130a0942553030303730333510ba1118f92120000a130a0942553030303730333510ba1118f92120005))
中,如果不存在则存储在新的Rdd中。
在上面的例子中输出将是:
DataSet CustomColumnsDS = new DataSet();
DataTable dt = new DataTable();
string strXML = GetCatalog(WebUserID, Password); //Web Service Call
XmlDocument doc = new XmlDocument();doc.LoadXml(strXML);
XmlNodeList xnList = doc.SelectNodes("xml/Catalog/item/Package");
if (xnList.Count > 0)//Count = 90
{
dt.Columns.Add("testId", typeof(string));
dt.Columns.Add("testName", typeof(string));
foreach (XmlNode xn in xnList)
{
if (!string.IsNullOrEmpty(xn["Id"].InnerText))
{
DataRow dr = dt.NewRow();
dr["testId"] = xn["Id"].InnerText;
dr["testName"] = xn["Name"].InnerText;
try
{
//At this point the DataRow is filled in with values, but it does not seem to actually add in.
dt.Rows.Add(dr); //No Exception is caught
}
catch (Exception ex)
{
string test = "";
}
}
}
CustomColumnsDS.Tables.Add(dt);//Count = 0;
}