在PySpark中使用DataFrame Show方法时出错

时间:2019-07-11 07:11:43

标签: python-3.x apache-spark pyspark google-bigquery

我正在尝试使用pandas和pyspark从BigQuery读取数据。我能够获取数据,但是在将其转换为Spark DataFrame时却以某种方式低于错误。

py4j.protocol.Py4JJavaError: An error occurred while calling o28.showString.
: java.lang.IllegalStateException: Could not find TLS ALPN provider; no working netty-tcnative, Conscrypt, or Jetty NPN/ALPN available
    at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts.defaultSslProvider(GrpcSslContexts.java:258)
    at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts.configure(GrpcSslContexts.java:171)
    at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts.forClient(GrpcSslContexts.java:120)
    at com.google.cloud.spark.bigquery.repackaged.io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder.buildTransportFactory(NettyChannelBuilder.java:401)
    at com.google.cloud.spark.bigquery.repackaged.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:444)
    at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:223)
    at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:169)
    at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:156)
    at com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.ClientContext.create(ClientContext.java:157) 

以下是环境详细信息

Python version : 3.7
Spark version : 2.4.3
Java version : 1.8

代码如下

import google.auth
import pyspark
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession , SQLContext
from google.cloud import bigquery


# Currently this only supports queries which have at least 10 MB of results
QUERY = """ SELECT * FROM test limit 1 """

#spark = SparkSession.builder.appName('Query Results').getOrCreate()
sc = pyspark.SparkContext()
bq = bigquery.Client()

print('Querying BigQuery')
project_id = ''
query_job = bq.query(QUERY,project=project_id)

# Wait for query execution
query_job.result()

df = SQLContext(sc).read.format('bigquery') \
    .option('dataset', query_job.destination.dataset_id) \
    .option('table', query_job.destination.table_id)\
    .option("type", "direct")\
    .load()

df.show()

我正在寻求帮助来解决此问题。

1 个答案:

答案 0 :(得分:0)

我设法找到了引用此link的更好的解决方案,以下是我的工作代码:

在编写以下代码之前,先在python库中安装pandas_gbq软件包。

string[] countries = { "Brazil", "Argentina", "England", "Spain"};
string result = countries.Aggregate((a, b) => a + ", " + b);
//result="Brazil,Argentina,England"

int[] Numbers = { 2, 3, 4 };
int aggregate= Numbers.Aggregate((a, b) => a * b);
//aggregate=24

希望对某人有帮助!保持pysparking:)