使用python脚本从PHP读取文件

时间:2015-05-24 02:40:42

标签: php python file

好的,这让我发疯了。我有一个小文件。这是保管箱链接https://www.dropbox.com/s/74nde57f07jj0zj/transcript.txt?dl=0

如果我尝试使用python f.read()读取文件的内容,我可以轻松阅读它。但是,如果我尝试使用php shell_exec()运行相同的python程序,则文件读取失败。这是我得到的错误。

Traceback (most recent call last): 
  File "/var/www/python_code.py", line 2, in <module>
    transcript = f.read() 
  File "/opt/anaconda/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 107: ordinal not in range(128)

我已经检查了所有权限问题,并且没有问题。

任何人都可以解释一下吗?

这是我的python代码。

f = open('./transcript/transcript.txt', 'r')
transcript = f.read()
print(transcript)

这是我的PHP代码。

$output = shell_exec("/opt/anaconda/bin/python /var/www/python_code.py");

谢谢!

编辑:我认为问题出在文件内容中。如果我用简单的'我吃饭'替换内容,那么我可以从php中读取内容。但目前的内容无法读取。还是不知道为什么。

1 个答案:

答案 0 :(得分:1)

The problem appears is that your file contains non-ASCII characters, but you're trying to read it as ASCII text.

Either it is text, but is in some encoding or other that you haven't told us (probably UTF-8, Latin-1, or cp1252, but there are countless other possibilities), or it's not text at all, but rather arbitrary binary data.


When you open a text file without specifying an encoding, Python has to guess. When you're running from inside the terminal or whatever IDE you use, presumably, it's guessing the same encoding that you used in creating the file, and you're getting lucky. But when you're running from PHP, Python doesn't have as much information, so it's just guessing ASCII, which means it fails to read the file because the file has bytes that aren't valid as ASCII.

If you want to understand how Python guesses, see the docs for open, but briefly: it calls locale.getpreferredencoding(), which, at least on non-Windows platforms, reads it from the locale settings in the environment. On a typical linux system that's not new enough to be based on systemd but not too old, the user's shell will be set up for a UTF-8 locale, but services will be set up for C locale. If all of that makes sense to you, you may see a way to work around your problem. If it all sounds like gobbledegook, just ignore it.


If the file is meant to be text, then the right solution is to just pass the encoding to the open call. For example, if the file is UTF-8, do this:

f = open('./transcript/transcript.txt', 'r', encoding='utf-8')

Then Python doesn't have to guess.


If, on the other hand, the file is arbitrary binary data, then don't open it in text mode:

f = open('./transcript/transcript.txt', 'rb')

In this case, of course, you'll get bytes instead of str every time you read from it, and print is just going to print something ugly like b'aq\x9bz' that makes no sense; you'll have to figure out what you actually want to do with the bytes instead of printing them as a bytes.