Question

我让Django运行在标准的WSGI / Apache httpd组合上。

我注意到，当我在shell中运行代码而不是浏览器时，文件输出是不同的。我已将其他所有内容隔离开来，但仍然遇到同样的问题。

以下是代码：

def test_antiword(filename):
    import subprocess
    with open(filename, 'w') as writefile:
        subprocess.Popen(["antiword", '/tmp/test.doc'], stdout=writefile)
    p = subprocess.Popen(["antiword", '/tmp/test.doc'], stdout=subprocess.PIPE)
    out, _ = p.communicate()
    ords = []
    for kk in out:
        ords.append(ord(kk))
    return out, ords

def test_antiword_view(request):
    import HttpResponse
    return HttpResponse(repr(test_antiword('/tmp/web.txt')))

在浏览器中打开网址时，这是输出：

（'\ n“我说好日子先生。美好的一天！”Sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z 34,73,32,115,97,105,100,32,103,111,111,100,32,100,97,121,32,115,105,114,46,32,71,111,111， 100,32,100,97,121,33,34,32,115,104,111,117,116,101,100,32,83,104,233,114,108,111,231,107,32， 72,248,108,109,101,163,46,10,10,32,32,32,32,32,32,32,32,32,32,32,32,32,34,87,104， 121,32,110,111,116,32,90,111,105,100,98,101,114,103,63,34,32,113,117,101,114,105,101,100,32， 90,111,105,100,98,101,114,103,46,10]）

当我调用test_antiword('/tmp/shell.txt')时，这是相应的输出：

（'\ n \ xe2 \ x80 \ x9cI说好日子先生。美好的一天！\ xe2 \ x80 \ x9d大喊Sh \ xc3 \ xa9rlo \ xc3 \ xa7k H \ xc3 \ xb8lme \ xc2 \ xa3。\ n \ n \ xe2 \ x80 \ x9c为什么不是Zoidberg？\ xe2 \ x80 \ x9d查询Zoidberg。\ n'，[10,226,128,156,73,32,115,97,105,100,32,103,111， 111,100,32,100,97,121,32,115,105,114,46,32,71,111,111,100,32,100,97,121,33,226,128,157,32， 115,104,111,117,116,101,100,32,83,104,195,169,114,108,111,195,167,107,32,72,195,184,108,109,101， 194,163,46,10,10,32,32,32,32,32,32,32,32,32,32,32,32,32,226,128,156,87,104,121,32， 110,111,116,32,90,111,105,100,98,101,114,103,63,226,128,157,32,113,117,101,114,105,101,100,32， 90,111,105,100,98,101,114,103,46,10]）

如您所见，输出非常不同。首先，shell输出维护原始文件中的空白;它在网络版中丢失了。

正如您在代码中看到的，我还将文档输出到文件中。生成的输出如下：

web.txt

"I said good day sir. Good day!" shouted Sh?rlo?k H?lme?.

             "Why not Zoidberg?" queried Zoidberg.

shell.txt

“I said good day sir. Good day!” shouted Shérloçk Hølme£.

             “Why not Zoidberg?” queried Zoidberg.

在网络版中，字符无法识别，编码由file标识为ISO-8859。在shell版本中，字符显示正确，编码由file标识为UTF-8。

我不知道为什么会发生这种情况。我已经检查过并且两个进程都使用相同版本的antiword。另外，我已经验证他们都使用subprocess的相同python模块文件。在这两种情况下使用的Python版本也完全匹配。

任何人都可以解释可能发生的事情吗？

Answer 1

差异可能是由于环境变量造成的。根据{{3}}：

Antiword使用环境变量LC_ALL，LC_CTYPE和LANG（按此顺序）获取当前区域设置，并使用此信息选择默认的映射文件。

我怀疑发生的事情是，当你从shell运行它时，你的shell是UTF-8语言环境，但是当你从Django运行它时，它处于不同的语言环境中，它无法正确转换Unicode字符。尝试在运行子进程时切换到UTF-8语言环境，如下所示：

new_env = dict(os.environ)  # Copy current environment
new_env['LANG'] = 'en_US.UTF-8'
p = subprocess.Popen(..., env=new_env)

subprocess.Popen命令（antiword）在shell与Web应用程序中产生不同的输出

1 个答案: