我目前在单独的Docker容器中运行Flask-RESTful应用程序和Apache Tika服务器。 Flask服务器在容器和主机上的端口5000上提供服务,而Tika服务器在9998上提供服务。
我希望能够将客户端上传到Flask服务器的文件传递给Tika服务器,以便我可以提取文档的文本。但是,我似乎无法得到任何工作;我尝试在文件中读取的每一种方式都失败了。有谁看到我做错了什么?
以下在Python中用于访问Tika服务器:
requests.request('PUT', 'http://localhost:9998/rmeta/text', data=open('test_doc.docx', 'rb'), headers={}).text
然而,尝试像Flask服务器那样路由:
requests.request('post', 'http://localhost:5000/index', files={'file': open('test_doc.docx', 'rb')}, headers={}).text
class Index(MethodView):
def post(self):
#Load in file
parse = reqparse.RequestParser()
parse.add_argument('file', type=werkzeug.datastructures.FileStorage, location='files')
args = parse.parse_args()
uploadedFile = args['file']
filename = secure_filename(uploadedFile.filename)
#Create temporary file
tmpfile = TemporaryFile()
tmpfile.write(uploadedFile.stream.read())
#Extract text
data = tika.extract_text(tmpfile)
tmpfile.close()
return data
import json
import requests
class Tika:
def __init__(self, endpoint):
self.endpoint = endpoint
def extract_text(self, filedata):
response = requests.request('put', self.endpoint, data=filedata, headers={}).json()
try:
return response[0]["X-TIKA:content"]
except:
return "ERROR"
Traceback (most recent call last):
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__
return self.wsgi_app(environ, start_response)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app
response = self.handle_exception(e)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask_restful/__init__.py", line 273, in error_router
return original_handler(e)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1540, in handle_exception
reraise(exc_type, exc_value, tb)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask_restful/__init__.py", line 273, in error_router
return original_handler(e)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/_compat.py", line 32, in reraise
raise value.with_traceback(tb)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask_restful/__init__.py", line 480, in wrapper
resp = resource(*args, **kwargs)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/views.py", line 84, in view
return self.dispatch_request(*args, **kwargs)
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/flask/views.py", line 149, in dispatch_request
return meth(*args, **kwargs)
File "/app/__init__.py", line 109, in post
data = get_clean_text(tika.extract_text(tmpfile))
File "/app/tika/__init__.py", line 16, in extract_text
response = requests.request('put', self.endpoint, data=filedata, headers={}).json()
File "/opt/conda/envs/SDL/lib/python3.5/site-packages/requests/models.py", line 885, in json
return complexjson.loads(self.text, **kwargs)
File "/opt/conda/envs/SDL/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/opt/conda/envs/SDL/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/conda/envs/SDL/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
我必须尝试将文件传递到服务器并对其进行解码,但对于我的生活,我无法弄明白。任何和所有的帮助将非常感激。
答案 0 :(得分:0)
问题是没有数据传递给tika.extract_text()
。这是因为tmpfile.write(uploadedFile.stream.read())
将上载的数据写入临时文件,之后文件指针位于文件末尾。然后将此文件句柄传递给tika.extract_text(tmpfile)
,因为文件指针位于文件的末尾,任何读取都将返回一个空字符串,因此没有任何内容传递给您的tika服务器。
您可以通过在将临时文件交给tika.extract_text()
之前寻找临时文件的开头来轻松解决此问题:
#Create temporary file
tmpfile = TemporaryFile()
tmpfile.write(uploadedFile.stream.read())
tmpfile.seek(0) # reposition file pointer to the start of the file
data = tika.extract_text(tmpfile)
在您发布的代码中,我不清楚为什么您需要使用临时文件。您只需将上传的数据直接传递给tika服务器:
from flask import Flask, request
from flask.views import MethodView
from flask.json import jsonify
app = Flask(__name__)
class Index(MethodView):
def post(self):
uploaded_file = request.files.get('file')
if uploaded_file:
data = tika.extract_text(uploaded_file)
else:
data = {'error': 'Missing upload file'}
return jsonify(data)
app.add_url_rule('/', view_func=Index.as_view('/'))
app.run()