pyspark中的parseException

时间:2017-11-15 19:29:34

标签: json pyspark apache-spark-sql spark-dataframe

我有一个编写的pyspark代码,它读取三个JSON文件并将JSON文件转换为DataFrames,DataFrames转换为执行SQL查询的表。

    import pyspark.sql
    from pyspark.sql import SparkSession
    from pyspark.sql import SQLContext
    from pyspark.sql import *
    from pyspark.sql import Row
    import json
    from pyspark.sql.types import StructType, StructField, StringType
    from pyspark.sql.types import *

    spark = SparkSession \
    .builder \
    .appName("project") \
    .getOrCreate()
    sc = spark.sparkContext
    sqlContext=SQLContext(sc)

reviewFile= sqlContext.read.json("review.json")
usersFile=sqlContext.read.json("user.json")
businessFile=sqlContext.read.json("business.json")
reviewFile.createOrReplaceTempView("review")
usersFile.createOrReplaceTempView("user")
businessFile.createOrReplaceTempView("business")
review_user = spark.sql("select r.review_id,r.user_id,r.business_id,r.stars,r.date,u.name,u.review_count,u.yelping_since from (review r join user u on r.user_id = u.user_id)")
review_user.createOrReplaceTempView("review_user")
review_user_business= spark.sql("select r.review_id,r.user_id,r.business_id,r.stars,r.date,r.name,r.review_count,r.yelping_since,b.address,b.categories,b.city,b.latitude,b.longitude,b.name,b.neighborhood,b.postal_code,b.review_count,b.stars,b.state from review_user r join business b on r.business_id= b.business_id")
review_user_business.createOrReplaceTempView("review_user_business")
#categories= spark.sql("select distinct(categories) from review_user_business")
categories= spark.sql("select distinct(r.categories) from review_user_business r where 'Food' in r.categories")

print categories.show(50)

你们可以在下面的链接中找到数据的描述。     https://www.yelp.com/dataset/documentation/json

我想要做的就是将含有食物的行作为其类别的一部分。 有人可以帮我吗?

1 个答案:

答案 0 :(得分:0)

在pyspark A in B中使用表达式A时,应该是列对象而不是常量值。

您要找的是array_contains

categories= spark.sql("select distinct(r.categories) from review_user_business r \
                      where array_contains(r.categories, 'Food')")