我正在使用spark 1.6并尝试获取并转换数据帧行值。
这是我的问题: 我的数据框中有一行有这种结构:
WrappedArray([List of String], [List of String])
我需要在WrappedArray中使用[List of String],所以我尝试使用此代码进行强制转换:
val RDD= DF.map(
f => {
if(f.getAs("ListOfRficAction")!=null){
var listActions = f.getAs("ColumnName").asInstanceOf[WrappedArray[List[List[Any]]]] .map(m=>m:+f.getAs("AssetId").toString)
})
我有以下错误:
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.collection.mutable.WrappedArray
我知道如何施展它吗?
答案 0 :(得分:1)
尝试转换为WrappedArray[WrappedArray[String]]
。
How to cast a WrappedArray[WrappedArray[Float]] to Array[Array[Float]] in spark (scala)
答案 1 :(得分:0)
感谢回答,我使用的是maven项目而不是sbt。我的代码编译没有问题,并且spark错误将我发送到这一行:var listActions = f.getAs("ColumnName").asInstanceOf[WrappedArray[List[List[Any]]]] .map(m=>m:+f.getAs("AssetId").toString)
。这是我的整个代码:
val ficRDDResult = ficDataFrameSelect.map(
f => {
if(f.getAs("ListOfRficAction")!=null){
var listActions = f.getAs("ListOfRficAction").asInstanceOf[WrappedArray[List[List[Any]]]] .map(m=>m:+f.getAs("AssetId").toString)
var listAttachments = listActions
.map(m=>{
m.map(x=> {
val a = Try(x.asInstanceOf[List[Any]])
if(a.isSuccess)
x
else
null
}).filter(f=>f!=null).map(x=>x.asInstanceOf[List[Any]])
})
.flatMap(f=>f)
.filter(f=>f!=null)
(listActions, listAttachments)
}else{
(null, null)
}
}).filter(f=>f._1!=null)
答案 2 :(得分:-3)
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.renault.qualite</groupId>
<version>2.0.4</version>
<name>qualite_spark</name>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.plugin>3.5.1</maven.compiler.plugin>
<maven.surefire.plugin>2.19.1</maven.surefire.plugin>
<maven.assembly.plugin>3.0.0</maven.assembly.plugin>
<scala.maven.plugin>3.2.1</scala.maven.plugin>
<elasticsearch.version>2.4.2</elasticsearch.version>
<elasticsearch.spark.version>2.4.2</elasticsearch.spark.version>
<spark.version>1.6.2.2.5.3.0-37</spark.version>
<hbase.version>1.1.2.2.5.3.0-37</hbase.version>
<shc.version>1.1.2-1.6-s_2.10</shc.version>
<biojava3.version>3.0</biojava3.version>
<jsqlparser.version>0.9.5</jsqlparser.version>
<encoding>UTF-8</encoding>
</properties>
<dependencies>
<!-- ElasticSearch shaded -->
<dependency>
<groupId>com.renault.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
<exclusions>
<exclusion>
<artifactId>netty</artifactId>
<groupId>io.netty</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-xml_2.11</artifactId>
<version>1.0.2</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-yarn</artifactId>
<version>${elasticsearch.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark_2.10</artifactId>
<version>${elasticsearch.spark.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- add the License jar as a dependency -->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-license-plugin</artifactId>
<version>1.0.0</version>
<scope>runtime</scope>
</dependency>
<!-- Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_2.10</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<artifactId>netty</artifactId>
<groupId>io.netty</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>eu.unicredit</groupId>
<artifactId>hbase-rdd_2.10</artifactId>
<version>0.8.0</version>
</dependency>
<dependency>
<groupId>org.json4s</groupId>
<artifactId>json4s-jackson_2.10</artifactId>
<version>3.2.10</version>
</dependency>
<!-- zhzhan -->
<dependency>
<groupId>com.hortonworks</groupId>
<artifactId>shc-core</artifactId>
<version>${shc.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<artifactId>netty</artifactId>
<groupId>io.netty</groupId>
</exclusion>
</exclusions>
</dependency>
<!-- Biojava3 -->
<dependency>
<groupId>org.biojava</groupId>
<artifactId>biojava3-core</artifactId>
<version>${biojava3.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- java sql parser -->
<dependency>
<groupId>com.github.jsqlparser</groupId>
<artifactId>jsqlparser</artifactId>
<version>${jsqlparser.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- suppression de cette dépendance, non présente sur maven central <dependency>
<groupId>javax.jms</groupId> <artifactId>jms</artifactId> <version>1.1</version>
</dependency> -->
<dependency>
<groupId>javax.jms</groupId>
<artifactId>javax.jms-api</artifactId>
<version>2.0.1</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- sicg -->
<dependency>
<groupId>sicg</groupId>
<artifactId>sicg</artifactId>
<version>3.4.1</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Commons-Mail -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-email</artifactId>
<version>1.3.2</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Spark-CSV -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Spark-AVRO -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-avro_2.10</artifactId>
<version>2.0.1</version>
</dependency>
<!-- HBase write -->
<dependency>
<groupId>it.nerdammer.bigdata</groupId>
<artifactId>spark-hbase-connector_2.10</artifactId>
<version>1.0.3</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Import dans le dépôt local via la commande : mvn install:install-file
-Dfile=connector.jar -DgroupId=com.ibm -Dversion=1 -DartifactId=connector
-Dpackaging=jar -DlocalRepositoryPath=D:\Dev\Apps\workspace\sparkODI\repo -->
<!-- Oracle JDBC -->
<dependency>
<groupId>oracle.jdbc</groupId>
<artifactId>ojdbc</artifactId>
<version>6</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- MySQL JDBC -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
<!-- MS SQL JDBC -->
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>sqljdbc4</artifactId>
<version>4.0</version>
</dependency>
<!-- hive-jdbc
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.1</version>
</dependency-->
<!-- JMS for IBM MQ -->
<dependency>
<groupId>com.ibm</groupId>
<artifactId>ibm-mq</artifactId>
<version>1</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.ibm</groupId>
<artifactId>ibm-mq-pcf</artifactId>
<version>1</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.ibm</groupId>
<artifactId>ibm-mqbind</artifactId>
<version>1</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.ibm</groupId>
<artifactId>ibm-mqjms</artifactId>
<version>1</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.ibm</groupId>
<artifactId>connector</artifactId>
<version>1</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--
<dependency>
<groupId>com.esotericsoftware.kryo</groupId>
<artifactId>kryo</artifactId>
<version>2.21</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
-->
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.10</version>
</dependency>
<!-- scala -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.6</version>
</dependency>
<!-- Spark Xml -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.10</artifactId>
<version>0.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<!-- Apache HttpClient -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.2</version>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>5.5.0</version>
</dependency>
</dependencies>
</dependencyManagement>
<!-- list of other repositories -->
<repositories>
<!-- repository added for the ODI stream project -->
<repository>
<id>project.local</id>
<name>project</name>
<url>file:${project.basedir}/repo</url>
</repository>
<repository>
<id>hortonworks-public</id>
<snapshots>
<enabled>false</enabled>
</snapshots>
<url>http://repo.hortonworks.com/content/groups/public/</url>
</repository>
<repository>
<id>hortonworks-nexus</id>
<snapshots>
<enabled>false</enabled>
</snapshots>
<url>http://nexus-private.hortonworks.com:8081/nexus/content/repositories/IN-QA/</url>
</repository>
<repository>
<id>hortonworks</id>
<snapshots>
<enabled>false</enabled>
</snapshots>
<url>http://repo.hortonworks.com/content/repositories/releases/</url>
</repository>
<repository>
<id>grails</id>
<snapshots>
<enabled>false</enabled>
</snapshots>
<url>http://repo.grails.org/grails/repo/</url>
</repository>
<repository>
<id>central</id>
<url>http://central.maven.org/maven2/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>repo1</id>
<url>http://repo1.maven.org/maven2</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>elasticsearch-releases</id>
<url>https://maven.elasticsearch.org/releases</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
<repository>
<id>SparkPackagesRepo</id>
<name>SparkPackagesRepo</name>
<url>http://dl.bintray.com/spark-packages/maven</url>
</repository>
<repository>
<id>jsqlparser-snapshots</id>
<snapshots>
<enabled>true</enabled>
</snapshots>
<url>https://oss.sonatype.org/content/groups/public/</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>java.net</id>
<name>java.net</name>
<url>http://download.java.net/maven/2</url>
</pluginRepository>
</pluginRepositories>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala.maven.plugin}</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${maven.surefire.plugin}</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
<!-- "package" command plugin -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<!-- add Main-Class to manifest file -->
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>Main</mainClass>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.lucene.codecs.Codec</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.lucene.codecs.DocValuesFormat</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.lucene.codecs.PostingsFormat</resource>
</transformer>
</transformers>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>jar-with-dependencies</shadedClassifierName>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<!-- Additional configuration. -->
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven.compiler.plugin}</version>
<configuration>
<source>${maven.compiler.source}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
</plugins>
</build>
<artifactId>dlq_spark</artifactId>
</project>