按Scala Spark

时间:2017-01-29 07:59:30

标签: scala apache-spark

df1.printSchema()打印出他们拥有的列名和数据类型。

df1.drop($"colName")将按名称删除列。

有没有办法让这个命令改为按数据类型删除?

2 个答案:

答案 0 :(得分:6)

如果您希望根据类型删除数据框中的特定列,则下面的代码段会有所帮助。在这个例子中,我有一个数据框,其中包含两个String和Int类型的列。我正在根据其类型从模式中删除我的String(String的所有字段都将被删除)字段。

import sqlContext.implicits._

val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2")

df.schema.fields
    .collect({case x if x.dataType.typeName == "string" => x.name})
    .foldLeft(df)({case(dframe,field) => dframe.drop(field)})

newDf的架构为org.apache.spark.sql.DataFrame = [c2: int]

答案 1 :(得分:2)

这是scala中的一种奇特方式:

mailMessage.Body = GetFormattedEntries(279); // this loads the HTML table from the DB - using ID 279 as my test

AlternateView avHtml = AlternateView.CreateAlternateViewFromString(mailMessage.Body, null, MediaTypeNames.Text.Html);

HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(mailMessage.Body);
var images = htmlDocument.DocumentNode.Descendants("img").ToList();


foreach (var image in images)
{
    var src = image.Attributes["src"].Value;
    var regex = new Regex(@"data:(?<mime>[\w/\-\.]+);(?<encoding>\w+),(?<data>.*)", RegexOptions.Compiled);
    var match = regex.Match(src);
    var mime = match.Groups["mime"].Value;
    var encoding = match.Groups["encoding"].Value;
    var data = match.Groups["data"].Value;

    byte[] bytes = Convert.FromBase64String(data);
    System.IO.MemoryStream embeddedMs = new System.IO.MemoryStream(bytes, 0, bytes.Length);
    LinkedResource pic1 = new LinkedResource(embeddedMs, new System.Net.Mime.ContentType(mime));
    pic1.TransferEncoding = TransferEncoding.Base64;
    pic1.ContentId = Guid.NewGuid().ToString();
    avHtml.LinkedResources.Add(pic1);
    var newNode = image.CloneNode(true);
    newNode.Attributes["src"].Value = string.Format("cid:{0}", pic1.ContentId);
    image.ParentNode.ReplaceChild(newNode, image);
}


mailMessage.IsBodyHtml = true;
mailMessage.Body = htmlDocument.DocumentNode.OuterHtml;

mailMessage.AlternateViews.Add(avHtml);