Spark支持哪些版本的avro和镶木地板格式?

时间:2017-06-07 07:11:06

标签: apache-spark avro parquet

Spark 2.0是否支持avro和镶木地板文件?什么版本?

我已下载apply plugin: 'android-library' apply plugin: 'maven' apply plugin: 'signing' repositories { mavenCentral() maven { url 'https://oss.sonatype.org/content/repositories/snapshots' } } dependencies { compile fileTree(include: '*.jar', dir: 'libs') compile 'com.socialize:facebook:3.1.3-SNAPSHOT' compile 'com.socialize:ioc:3.1.3-SNAPSHOT' //compile 'com.socialize:loopy:3.1.7-SNAPSHOT' } android { compileSdkVersion 25 buildToolsVersion "25.0.3" defaultConfig { minSdkVersion 21 targetSdkVersion 25 versionCode 1 versionName "1.0" testApplicationId "com.socialize.test" testInstrumentationRunner "com.socialize.SocializeTestRunner" } compileOptions { sourceCompatibility JavaVersion.VERSION_1_7 targetCompatibility JavaVersion.VERSION_1_7 } sourceSets { main { manifest.srcFile 'AndroidManifest.xml' java.srcDirs = ['src'] resources.srcDirs = ['src'] aidl.srcDirs = ['src'] renderscript.srcDirs = ['src'] res.srcDirs = ['res'] assets.srcDirs = ['assets'] } androidTest { java.srcDirs = ['src'] resources.srcDirs = ['src'] aidl.srcDirs = ['src'] renderscript.srcDirs = ['src'] res.srcDirs = ['res'] assets.srcDirs = ['assets'] } // Move the tests to tests/java, tests/res, etc... androidTest.setRoot('../test') // Move the build types to build-types/<type> // For instance, build-types/debug/java, build-types/debug/AndroidManifest.xml, ... // This moves them out of them default location under src/<type>/... which would // conflict with src/ being used by the main source set. // Adding new build types or product flavors should be accompanied // by a similar customization. debug.setRoot('build-types/debug') release.setRoot('build-types/release') } } def isReleaseBuild() { return VERSION_NAME.contains("SNAPSHOT") == false } def getReleaseRepositoryUrl() { return hasProperty('RELEASE_REPOSITORY_URL') ? RELEASE_REPOSITORY_URL : "https://oss.sonatype.org/service/local/staging/deploy/maven2/" } def getSnapshotRepositoryUrl() { return hasProperty('SNAPSHOT_REPOSITORY_URL') ? SNAPSHOT_REPOSITORY_URL : "https://oss.sonatype.org/content/repositories/snapshots/" } def getRepositoryUsername() { return hasProperty('NEXUS_USERNAME') ? NEXUS_USERNAME : "" } def getRepositoryPassword() { return hasProperty('NEXUS_PASSWORD') ? NEXUS_PASSWORD : "" } if (hasProperty('GROUP')) { afterEvaluate { project -> uploadArchives { repositories { mavenDeployer { beforeDeployment { MavenDeployment deployment -> signing.signPom(deployment) } pom.groupId = GROUP pom.artifactId = POM_ARTIFACT_ID pom.version = VERSION_NAME repository(url: getReleaseRepositoryUrl()) { authentication(userName: getRepositoryUsername(), password: getRepositoryPassword()) } snapshotRepository(url: getSnapshotRepositoryUrl()) { authentication(userName: getRepositoryUsername(), password: getRepositoryPassword()) } pom.project { name POM_NAME packaging POM_PACKAGING description POM_DESCRIPTION url POM_URL scm { url POM_SCM_URL connection POM_SCM_CONNECTION developerConnection POM_SCM_DEV_CONNECTION } licenses { license { name POM_LICENCE_NAME url POM_LICENCE_URL distribution POM_LICENCE_DIST } } developers { developer { id POM_DEVELOPER_ID name POM_DEVELOPER_NAME } } } } } } signing { required { isReleaseBuild() && gradle.taskGraph.hasTask("uploadArchives") } sign configurations.archives } task androidJavadocs(type: Javadoc) { source = android.sourceSets.main.java.srcDirs classpath += project.files(android.getBootClasspath().join(File.pathSeparator)) excludes = ['**/i18n.properties', '**/default.socialize.properties', '**/socialize_core_beans.xml', '**/socialize_notification_beans.xml', '**/socialize_ui_beans.xml', '**/socialize.errors.properties'] } task androidJavadocsJar(type: Jar, dependsOn: androidJavadocs) { classifier = 'javadoc' from androidJavadocs.destinationDir } task androidSourcesJar(type: Jar) { classifier = 'sources' from android.sourceSets.main.java.sourceFiles } artifacts { archives androidSourcesJar archives androidJavadocsJar } } } 并在加载过程中遇到此错误:

spark-avro_2.10-0.1.jar

2 个答案:

答案 0 :(得分:3)

您只是使用了错误的依赖项。您应该使用与Scala 2.11编译的@EnableJpaRepositories(basePackages = "au.mypkg", transactionManagerRef = "myRealTransactionManager" 依赖项。你可以找到它here

对于镶木地板,它不受任何依赖性的支持,无法添加到您的应用程序中。

答案 1 :(得分:2)

  

spark 2.0是否支持avro和镶木地板文件?

开箱即用的Spark 2.x不支持Avro格式。您必须使用外部包,例如spark-avro

Name: java.lang.IncompatibleClassChangeError
Message: org.apache.spark.sql.sources.TableScan

java.lang.IncompatibleClassChangeError的原因是您使用了为 Scala 2.10 编译的spark-avro_2.10-0.1.jar,但默认情况下Spark 2.0使用 Scala 2.11 。这不可避免地导致IncompatibleClassChangeError错误。

您应该使用--packages命令行选项加载spark-avro软件包(如With spark-shell or spark-submit中spark-avro的官方文档中所述):

$ bin/spark-shell --packages com.databricks:spark-avro_2.11:3.2.0
  

使用--packages可确保将此库及其依赖项添加到类路径中。 --packages参数也可以与bin/spark-submit一起使用。

Parquet格式是加载或保存数据集时的默认格式。

// loading parquet datasets
spark.read.load

// saving in parquet format
mydataset.write.save

您可能需要阅读官方文档中的Parquet Files支持:

  

Spark SQL支持读取和写入自动保留原始数据模式的Parquet文件。

使用了Parquet 1.8.2(你可以在Spark的pom.xml中看到)