我能够将数据从spark保存到Mysql,但不能同时保存到MongoDB和Mysql。谁能告诉我怎么做?我在下面提到了我的代码来展平JSON,def保存到MongoDB和spark-submit命令。
我正在尝试将原始Twitter数据保存到MongoDB。有人可以指导我吗?
为MongoDB平展JSON的代码:
def convertMongo(rdd):
try:
spark = getSparkSessionInstance(rdd.context.getConf())
df_json = spark.createDataFrame(rdd.map(lambda x: (_flatten_JSON(json.loads(x[1])))))
return df_json
except Exception as e:
print(str(e))
保存到MongoDB的代码:
def write_mongo(rdd):
try:
mongoDFRDD = convertMongo(rdd)
if(mongoDFRDD is not None):
mongoDFRDD.write.format('com.mongodb.spark.sql.DefaultSource').mode('append').option("hos$
except Exception as e:
print(str(e))
Spark提交命令工作:
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.1,mysql:mysql-connector-java:5.1.45 --jars spark-streaming-kafka-0-8-assembly_2.11-2.0.0.jar MachineLearningOnStreamDataInSpark2.py
Spark提交命令与MongoDB --conf包含不起作用:
spark-submit --conf spark.logConf = true --conf "spark.mongodb.input.uri=mongodb://127.0.0.1:27017/Twitter.TwitterData?readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://127.0.0.1:27017/Twitter.TwitterData" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0 --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.1 --jars spark-streaming-kafka-0-8-assembly_2.11-2.0.0.jar MachineLearningOnStreamDataInSpark2.py