我正在尝试使用以下代码:
addresses = spark.sql('''SELECT
street_address,
city,
state,
zip_code
FROM table''')
results = addresses.rdd.map(callAPI).toDF()
def callAPI(row):
params = {
'street_line_1': row.street_address,
'city': row.city,
'state_code': row.state,
'postal_code': row.zip_code}
response = requests.get('http://localhost:5000', params = params, verify = False).json()
return Row(**response)
我在跑步时遇到此问题:
raise ValueError("Some of types cannot be determined by the "
ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling
我也尝试过使用createDataFrame传递模式:
results = spark.createDataFrame(results, schema = schema)
但这给了我:
raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, type(obj)))
TypeError: IntegerType can not accept object '0000' in type <class 'str'>
我的目标是遍历数据框并应用功能,然后获取另一个数据框。 api送还字典我哪里出问题了?