df1.printSchema()
打印出他们拥有的列名和数据类型。
df1.drop($"colName")
将按名称删除列。
有没有办法让这个命令改为按数据类型删除?
答案 0 :(得分:6)
如果您希望根据类型删除数据框中的特定列,则下面的代码段会有所帮助。在这个例子中,我有一个数据框,其中包含两个String和Int类型的列。我正在根据其类型从模式中删除我的String(String的所有字段都将被删除)字段。
import sqlContext.implicits._
val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2")
df.schema.fields
.collect({case x if x.dataType.typeName == "string" => x.name})
.foldLeft(df)({case(dframe,field) => dframe.drop(field)})
newDf
的架构为org.apache.spark.sql.DataFrame = [c2: int]
答案 1 :(得分:2)
这是scala中的一种奇特方式:
mailMessage.Body = GetFormattedEntries(279); // this loads the HTML table from the DB - using ID 279 as my test
AlternateView avHtml = AlternateView.CreateAlternateViewFromString(mailMessage.Body, null, MediaTypeNames.Text.Html);
HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(mailMessage.Body);
var images = htmlDocument.DocumentNode.Descendants("img").ToList();
foreach (var image in images)
{
var src = image.Attributes["src"].Value;
var regex = new Regex(@"data:(?<mime>[\w/\-\.]+);(?<encoding>\w+),(?<data>.*)", RegexOptions.Compiled);
var match = regex.Match(src);
var mime = match.Groups["mime"].Value;
var encoding = match.Groups["encoding"].Value;
var data = match.Groups["data"].Value;
byte[] bytes = Convert.FromBase64String(data);
System.IO.MemoryStream embeddedMs = new System.IO.MemoryStream(bytes, 0, bytes.Length);
LinkedResource pic1 = new LinkedResource(embeddedMs, new System.Net.Mime.ContentType(mime));
pic1.TransferEncoding = TransferEncoding.Base64;
pic1.ContentId = Guid.NewGuid().ToString();
avHtml.LinkedResources.Add(pic1);
var newNode = image.CloneNode(true);
newNode.Attributes["src"].Value = string.Format("cid:{0}", pic1.ContentId);
image.ParentNode.ReplaceChild(newNode, image);
}
mailMessage.IsBodyHtml = true;
mailMessage.Body = htmlDocument.DocumentNode.OuterHtml;
mailMessage.AlternateViews.Add(avHtml);