Question

我正在尝试使用textract读取.txt，.docx，.pdf等文件的内容。当我使用下面的代码时，它会抛出错误：

  server {
    listen       443 ssl;
    ssl_certificate      /etc/nginx/server.crt;
    ssl_certificate_key  /etc/nginx/server.key;
    ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;
    keepalive_timeout   70;
    server_tokens off;
    fastcgi_param   HTTPS               on;
    fastcgi_param   HTTP_SCHEME         https;
    server_name _;
    root /run/www;
    index  index.php index.pl index.cgi index.html;

    ...

    rewrite ^/p/(.*)$ /production/$1 break;

    ....


    location ~ .*(\.pl|\.cgi)?$
    {
        proxy_set_header  Host             $host;
        set_real_ip_from  180.76.160.246;
        set_real_ip_from  127.0.0.1;
        real_ip_header X-Forwarded-For;
        real_ip_recursive on;
        gzip on;
        include        fastcgi_params;
        fastcgi_pass 127.0.0.1:9001; 
        fastcgi_read_timeout   60;

        expires 1m;  

    }

...
}

当我上传docx文件时，

文件“/usr/lib/python2.7/genericpath.py”，第26行，存在 os.stat（path）TypeError：强制转换为Unicode：需要字符串或缓冲区，实例发现 10.0.2.2 - [12 / Apr / 2018 09:04:58]“POST / upload HTTP / 1.1”500 -

如何将带有不同扩展名的这些文件发送到带有烧瓶的textract中？

Answer 1

我认为Textract无法处理文件流

请尝试使用确切的文件路径及其扩展名，如：

textdata=textract.process("C:\some_path_to_file",extension=".pdf")

它有效并试一试

Answer 2

我遇到了同样的问题。我们必须先将文件上传到服务器上，然后再访问它。奏效了！

如何使用烧瓶内的textract给输入文件读取

2 个答案: