我正在使用Ibm Watson Studio为机器学习项目设置Jupyter Notebook项目,当我尝试从Postgresql数据库表中添加数据时,我不断收到TypeError不是JSON可序列化的消息。
完整的错误输出:
WARNING: DB: Invalid utf8 character string: 'C7C34F' (SQL State: HY000 - Error Code: 1300)
WARNING: DB: Incorrect string value: '\xC3O FUN...' for column 'claimed_name_pt' at row 10 (SQL State: HY000 - Error Code: 1366)
这是我在Notebook中的python代码,该代码部署了AI模型来分析这些数据:
TypeError Traceback (most recent call last)
<ipython-input-16-e72fac39b809> in <module>()
1 classes = natural_language_classifier.classify('998520s521-nlc-1398', data_df_1.to_json())
----> 2 print(json.dumps(classes, indent=2))
/opt/conda/envs/DSX-Python35/lib/python3.5/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
235 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
236 separators=separators, default=default, sort_keys=sort_keys,
--> 237 **kw).encode(obj)
238
239
/opt/conda/envs/DSX-Python35/lib/python3.5/json/encoder.py in encode(self, o)
198 chunks = self.iterencode(o, _one_shot=True)
199 if not isinstance(chunks, (list, tuple)):
--> 200 chunks = list(chunks)
201 return ''.join(chunks)
202
/opt/conda/envs/DSX-Python35/lib/python3.5/json/encoder.py in _iterencode(o, _current_indent_level)
434 raise ValueError("Circular reference detected")
435 markers[markerid] = o
--> 436 o = _default(o)
437 yield from _iterencode(o, _current_indent_level)
438 if markers is not None:
/opt/conda/envs/DSX-Python35/lib/python3.5/json/encoder.py in default(self, o)
177
178 """
--> 179 raise TypeError(repr(o) + " is not JSON serializable")
180
181 def encode(self, o):
TypeError: <watson_developer_cloud.watson_service.DetailedResponse object at 0x7f64ee350240> is not JSON serializable
我尝试运行以下命令:from watson_developer_cloud import NaturalLanguageClassifierV1
import pandas as pd
import psycopg2
# Connecting to my database.
conn_string = 'host={} port={} dbname={} user={} password={}'.format('159.***.20.***', 5432, 'searchdb', 'lcq09', 'Mys3cr3tPass')
conn_cbedce9523454e8e9fd3fb55d4c1a52e = psycopg2.connect(conn_string)
data_df_1 = pd.read_sql('SELECT description from public."search_product"', con=conn_cbedce2drf563454e8e9fd3fb8776fgh2e)
# Connecting to the ML model.
natural_language_classifier = NaturalLanguageClassifierV1(
iam_apikey='TB97dFv8Dgug6rfi945F3***************'
)
# Apply the ML model to db datas
classes = natural_language_classifier.classify('9841d0z5a1-ncc-9076', data_df_1.to_json())
print(json.dumps(classes, indent=2))
,以确保格式为Json,并且格式正确,如下所示:
ps:下面的数据是随机的Lorem句子,但经过测试将成为产品说明。
print(data_df_1.to_json())
我还可以使用下面的代码对单个句子进行分类,但是我想对整个数据库的描述表进行分类:
{"description":{"0":"Lorem ipsum sjvh hcx bftiyf, hufcil, igfgvjuoigv gvj ifcil ,ghn fgbcggtc yfctgg h vgchbvju.","1":"Lorem ajjgvc wiufcfboitf iujcvbnb hjnkjc ivjhn oikgjvn uhnhgv 09iuvhb oiuvh boiuhb mkjhv mkiuhygv m,khbgv mkjhgv mkjhgv.","2":"Lorem aiv ibveikb jvk igvcib ok blnb v hb b hb bnjb bhb bhn bn vf vbgfc vbgv nbhgv bb nb nbh nj mjhbv mkjhbv nmjhgbv nmkn","3":"Lorem jsvc smc cbd ciecdbbc d vd bcvdvbj obcvb vcibs j dvx","4":"Lorem jsvc smc cbd ciecdbbc d vd bcvdvbj obcvb vcibs j dvx","5":"Lorem jsvc smc cbd ciecdbbc d vd bcvdvbj obcvb vcibs j dvx"}}
这就是为什么我将句子替换为名为classes = natural_language_classifier.classify('998260x551-nlc-1018', 'How hot will it be today?')
print(json.dumps(classes.result, indent=2))
的数据框的原因。
但是如上所述,我遇到TypeError
那我该怎么解决这个错误?
答案 0 :(得分:0)
您的问题是,数据框内有一个
dt
JSON序列化器Python模块不知道如何处理。
查看api似乎可以调用filter_criteria
实例方法(由于使用私有方法,因此会感到烦恼),或调用watson_developer_cloud.watson_service.DetailedResponse
方法来获取字典以从对象中删除数据。
理想地,您使用上述两种方法之一对包含该对象的数据帧中的每一行进行预处理,以序列化该对象,然后detailed_response._to_dict
不应与该列一起抛出detailed_response.get_response
。
.to_json