我正在尝试使用HTMLQuestion数据结构和boto3的create_hit功能构建XML以提交给Amazon的Mechanical Turks服务。根据文档,XML应格式化为like this。
我创建了一个类TurkTaskAssembler
,它有生成xml的方法,并通过API将此XML传递给Mechanical Turks平台。我使用boto3库来处理与Amazon的通信。
我生成的XML似乎格式不正确,因为当我尝试通过API传递此XML时,我收到验证错误,如下所示:
>>> tta = TurkTaskAssembler("What color is the sky?")
>>> response = tta.create_hit_task()
>>> ParamValidationError: Parameter validation failed: Invalid type for parameter Question, value: <Element HTMLQuestion at 0x1135f68c0>, type: <type 'lxml.etree._Element'>, valid types: <type 'basestring'>
然后,我修改了create_question_xml
方法,使用tostring
方法将XML信封转换为字符串,但这会产生不同的错误:
>>> tta = TurkTaskAssembler("What color is the sky?")
>>> tta.create_hit_task()
>>> ClientError: An error occurred (ParameterValidationError) when calling the CreateHIT operation: There was an error parsing the XML question or answer data in your request. Please make sure the data is well-formed and validates against the appropriate schema. Details: cvc-elt.1.a: Cannot find the declaration of element 'HTMLQuestion'. (1508611228659 s)
我真的不确定我做错了什么,并且只有很少的XML经验。
以下是所有相关代码:
import os
import boto3
from lxml.etree import Element, SubElement, CDATA, tostring
from .settings import mturk_access_key_id, mturk_access_secret_key
xml_schema_url = 'http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2011-11-11/HTMLQuestion.xsd'
class TurkTaskAssembler(object):
def __init__(self, question):
self.client = boto3.client(
service_name='mturk',
region_name='us-east-1',
endpoint_url='https://mturk-requester-sandbox.us-east-1.amazonaws.com',
aws_access_key_id=mturk_access_key_id,
aws_secret_access_key=mturk_access_secret_key
)
self.question = question
def create_question_xml(self):
# questionFile = open(os.path.join(__location__, "question.xml"), "r")
# question = questionFile.read()
# return question
XHTML_NAMESPACE = xml_schema_url
XHTML = "{%s}" % XHTML_NAMESPACE
NSMAP = {
None : XHTML_NAMESPACE,
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
''
}
envelope = Element("HTMLQuestion", nsmap=NSMAP)
html = """
<!DOCTYPE html>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
<script type='text/javascript' src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js'></script>
</head>
<body>
<form name='mturk_form' method='post' id='mturk_form' action='https://www.mturk.com/mturk/externalSubmit'>
<input type='hidden' value='' name='assignmentId' id='assignmentId'/>
<h1>Answer this question</h1>
<p>{question}</p>
<p><textarea name='comment' cols='80' rows='3'></textarea></p>
<p><input type='submit' id='submitButton' value='Submit' /></p></form>
<script language='Javascript'>turkSetAssignmentID();</script>
</body>
</html>
""".format(question=self.question)
html_content = SubElement(envelope, 'HTMLContent')
html_content.text = CDATA(html)
xml_meta = """<?xml version="1.1" encoding="utf-8"?>"""
return xml_meta + tostring(envelope, encoding='utf-8')
def create_hit_task(self):
response = self.client.create_hit(
MaxAssignments=1,
AutoApprovalDelayInSeconds=10800,
LifetimeInSeconds=10800,
AssignmentDurationInSeconds=300,
Reward='0.05',
Title='a title',
Keywords='some keywords',
Description='a description',
Question=self.create_question_xml(),
)
return response
答案 0 :(得分:2)
为什么不简单地将XML数据放在一个单独的XML文件中(就像你做的那样,但注释掉了)?这将阻止您必须合并多个模块和大量代码。
使用您描述的模板here,创建question.xml
:
<HTMLQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2011-11-11/HTMLQuestion.xsd">
<HTMLContent><![CDATA[
<!DOCTYPE html>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
<script type='text/javascript' src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js'></script>
</head>
<body>
<form name='mturk_form' method='post' id='mturk_form' action='https://www.mturk.com/mturk/externalSubmit'>
<input type='hidden' value='' name='assignmentId' id='assignmentId'/>
<h1>Answer this question</h1>
<p>{question}</p>
<p><textarea name='comment' cols='80' rows='3'></textarea></p>
<p><input type='submit' id='submitButton' value='Submit' /></p></form>
<script language='Javascript'>turkSetAssignmentID();</script>
</body>
</html>
]]>
</HTMLContent>
<FrameHeight>450</FrameHeight>
</HTMLQuestion>
然后在create_question_xml()
函数中:
def create_question_xml(self):
question_file = open("question.xml", "r").read()
xml = question_file.format(question=self.question)
return xml
这应该是你所需要的一切。
答案 1 :(得分:0)
我认为你对亚马逊建议你使用的3种格式有点困惑。 从我看到你去了HTMLQuestion。 (其他两个是:ExternalQuestion和QuestionFormData)。
要以HTMLQuestion
格式保留问题,只需使用文档中提供的简单示例,无需将其包装在XML中。这是一个固定的功能:
def create_question_html(self):
# you can extract template into a file,
# as @Mangohero1 suggested which would simplify code a bit.
return """
<HTMLQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2011-11-11/HTMLQuestion.xsd">
<HTMLContent><![CDATA[
<!DOCTYPE html>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
<script type='text/javascript' src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js'></script>
</head>
<body>
<form name='mturk_form' method='post' id='mturk_form' action='https://www.mturk.com/mturk/externalSubmit'>
<input type='hidden' value='' name='assignmentId' id='assignmentId'/>
<h1>{question}</h1>
<p><textarea name='comment' cols='80' rows='3'></textarea></p>
<p><input type='submit' id='submitButton' value='Submit' /></p></form>
<script language='Javascript'>turkSetAssignmentID();</script>
</body>
</html>
]]>
</HTMLContent>
<FrameHeight>450</FrameHeight>
</HTMLQuestion>
""".format(question=self.question)