我在消费者评论的火花数据框上使用NLTK创建了一个单词袋模型。我的最终数据集中有3列:情感,文本和袋词。架构如下所示
StructType(List(StructField(Sentiment,StringType,true),StructField(text,StringType,true),StructField(bagofwords,ArrayType(StringType,true),true)))
bagofwords列中的每个记录都是已删除标点符号和停用词的单词的列表。我认为这是引起问题的原因。
我想通过传递这样的json负载来对我部署的spark ml模型进行评分
scoring_payload = {"fields": ["text", "bagofwords"], "values": ["I hate this place, they are very incompetent", "['this', 'place', 'hate', 'they', 'incompetent']"]}
但是我不断收到错误消息,例如:
Status code: 400, body: {
"trace": "ff8e614b33c635684e648e2c6705d9eb",
"errors": [{
"code": "invalid_payload",
"message": "Input Json parsing failed with error: java.lang.ClassCastException"
}]
}
我还不熟悉Java或Scala,但到目前为止我仍可以推断,我认为问题与从数组/列表到字符串的转换有关,反之亦然。
我尝试通过转储为Json来调整有效负载,但这也会引发错误。 我还按照以下链接中显示的步骤进行操作:
final_dataset1 = spark.read.parquet('final_sparkml_dataset_pq')
final_dataset1.show()
+---------+--------------------+--------------------+
|Sentiment| text| bagofwords|
+---------+--------------------+--------------------+
| negative|You need to doubl...|[something, cold,...|
| negative|Now first off I a...|[out, actually, f...|
| negative|I should have bee...|[we, was, gel, my...|
| negative|We stayed at the ...|[out, ball, cater...|
| negative|I figured I would...|[, respond, compa...|
| negative|Asked for blonde,...|[absolutely, awfu...|
| negative|There are places ...|[grumble, envisio...|
| negative|This place is ter...|[was, for, plotti...|
| negative|I had went here a...|[popped, circumst...|
from watson_machine_learning_client import WatsonMachineLearningAPIClient
wml_credentials = {
"apikey": "***",
"iam_apikey_description": "Auto-generated for key ***",
"iam_apikey_name": "wdp-writer",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "***",
"instance_id": "***",
"password": "***",
"url": "https://us-south.ml.cloud.ibm.com",
"username": "**"
}
client = WatsonMachineLearningAPIClient(wml_credentials)
created_deployment = client.deployments.create(published_model_uid, name="Sentiment Predictor SparkML")
scoring_endpoint = client.deployments.get_scoring_url(dep_details)
scoring_payload = {"fields": ["text", "bagofwords"], "values": ["I hate this place, they are very incompetent", "['this', 'place', 'hate', 'they', 'incompetent']"]}
deploy_model_pred = client.deployments.score(scoring_endpoint, scoring_payload)
我不断收到演员表错误:
Status code: 400, body: {
"trace": "ff8e614b33c635684e648e2c6705d9eb",
"errors": [{
"code": "invalid_payload",
"message": "Input Json parsing failed with error: java.lang.ClassCastException"
}]
}
我期望输出“ rawprediction”,“ probability”,“ predictionlabel”和其他结果,类似于在测试数据上运行转换方法时通常会得到的结果
关于我在做什么错的任何想法?在这种情况下,如何使有效载荷有效?