Question

我正在测试下面的代码，我发现＆＃34; print＆＃34;之后的输出与文本文件不一致。我已将编码设置为＆＃34; UTF-8＆＃34;。这是一个错误吗？怎么解决？

import requests

url = "http://www.aastocks.com/tc/stocks/analysis/company-fundamental/financial-ratios?symbol=0001&period=4"
r = requests.get(url)
print r.content
f = open("test.txt","w")
f.write(r.content)

Answer 1

虽然我不知道你正在使用的python的确切版本，但我冒昧地猜测它不是因为使用print语句而不是3.x.

问题不在于您的打印声明本身，但，显示这么长的行（这个长175765）经常是一个重要问题。 Python（特别是在Windows上）在处理几KB（本例中为176KB）的行时开始变得情绪化。而不是尝试在一个语句中显示整个字符串，尝试将其分解为多个部分，然后显示。您将看到屏幕上显示的r.content与通过f.write存储的内容之间没有区别。

只是为了您的确认，您可以在代码后执行此操作：

fh = open("test.txt","r")
print fh.read()
fh.close()

您会注意到，这与前一个印刷语句所显示的内容之间没有区别。

我在python 3.4.x和linux上尝试过这个，但是你提到的行为并没有被这个python和platform的组合所观察到。

编辑1

这就是我的尝试：

import requests
url = "http://www.aastocks.com/tc/stocks/analysis/company-fundamental/financial-ratios?symbol=0001&period=4"
r = requests.get(url)
a = print(str(r.content))
f = open("test.txt","w")
f.write(str(r.content))
f.close()
f = open("test.txt","r")
print(f.read())
f.close()

这是输出： http://pastebin.com/R0j0mYe5

编辑2

我没有注意到标题被削减了。我在2.x中尝试过并看到了这种行为。 这确实是一个问题。显然，在扫描html和解码打印时会出现一些问题。：

这就是我所看到的：

print r.content[0:500]
print "*****"
print r.content[0:1000]

给予和喜欢这样：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="公司資料, 主要財經比率, 流動比率, 股東權益回報率, 總資產回報率, 邊際利潤率, 派息比率" /><meta name="description" content="公司資料, 財務比率, 變現能力, 償債能力

*****

</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="teonal.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="公司資料, 主要財經比率, 流動比率, 股東權益回報率, 總資產回報率, 邊際利潤率, 派息比率" /><meta name="description" content="公司資料, 財務比率, 變現能力, 償債能力, 投資回報, 盈利能力, 營運能力, 投資收益, 綜合全年, 綜合中期" /><meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <script type="text/javascript">

正如我们在仅打印前500行时所看到的那样，操作符合预期，但是当我们尝试更多时，会出现错误。

当它试图解码整个文档时，会发生一些奇怪的事情。

但是，在python 3.4.x中我看到了：

print(con[0:500]) #con = r.content
print(con[0:1000])

输出：

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe4\xb8\xbb\xe8\xa6\x81\xe8\xb2\xa1\xe7\xb6\x93\xe6\xaf\x94\xe7\x8e\x87, \xe6\xb5\x81\xe5\x8b\x95\xe6\xaf\x94\xe7\x8e\x87, \xe8\x82\xa1\xe6\x9d\xb1\xe6\xac\x8a\xe7\x9b\x8a\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe7\xb8\xbd\xe8\xb3\x87\xe7\x94\xa2\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe9\x82\x8a\xe9\x9a\x9b\xe5\x88\xa9\xe6\xbd\xa4\xe7\x8e\x87, \xe6\xb4\xbe\xe6\x81\xaf\xe6\xaf\x94\xe7\x8e\x87" /><meta name="description" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe8\xb2\xa1\xe5\x8b\x99\xe6\xaf\x94\xe7\x8e\x87, \xe8\xae\x8a\xe7\x8f\xbe\xe8\x83\xbd\xe5\x8a\x9b, \xe5\x84\x9f\xe5\x82\xb5\xe8\x83\xbd\xe5\x8a\x9b'

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe4\xb8\xbb\xe8\xa6\x81\xe8\xb2\xa1\xe7\xb6\x93\xe6\xaf\x94\xe7\x8e\x87, \xe6\xb5\x81\xe5\x8b\x95\xe6\xaf\x94\xe7\x8e\x87, \xe8\x82\xa1\xe6\x9d\xb1\xe6\xac\x8a\xe7\x9b\x8a\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe7\xb8\xbd\xe8\xb3\x87\xe7\x94\xa2\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe9\x82\x8a\xe9\x9a\x9b\xe5\x88\xa9\xe6\xbd\xa4\xe7\x8e\x87, \xe6\xb4\xbe\xe6\x81\xaf\xe6\xaf\x94\xe7\x8e\x87" /><meta name="description" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe8\xb2\xa1\xe5\x8b\x99\xe6\xaf\x94\xe7\x8e\x87, \xe8\xae\x8a\xe7\x8f\xbe\xe8\x83\xbd\xe5\x8a\x9b, \xe5\x84\x9f\xe5\x82\xb5\xe8\x83\xbd\xe5\x8a\x9b, \xe6\x8a\x95\xe8\xb3\x87\xe5\x9b\x9e\xe5\xa0\xb1, \xe7\x9b\x88\xe5\x88\xa9\xe8\x83\xbd\xe5\x8a\x9b, \xe7\x87\x9f\xe9\x81\x8b\xe8\x83\xbd\xe5\x8a\x9b, \xe6\x8a\x95\xe8\xb3\x87\xe6\x94\xb6\xe7\x9b\x8a, \xe7\xb6\x9c\xe5\x90\x88\xe5\x85\xa8\xe5\xb9\xb4, \xe7\xb6\x9c\xe5\x90\x88\xe4\xb8\xad\xe6\x9c\x9f" /><meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <script type="text/javascript">\rvar _gaq = _gaq || [];\r_gaq.push([\'_setAccount\', \'UA-20790503-3\']);\r_gaq.push([\'_setDomainName\', \'www.aastocks.com\']);\r_gaq.push([\'_trackPageview\']);\r_gaq.push([\'_trackPageLoadTime\']);\rfunction OA_show(name) {\r} \r</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="te'

但如果我尝试解码utf-8，输出在3.x（如2.x）中类似：

print(con[0:500].decode('utf-8'))
print(con[0:1000].decode('utf-8'))

OP：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="公司資料, 主要財經比率, 流動比率, 股東權益回報率, 總資產回報率, 邊際利潤率, 派息比率" /><meta name="description" content="公司資料, 財務比率, 變現能力, 償債能力

</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="te

Answer 2

运行控制台缓冲区可以容纳多少行的内部限制。它限制在大约15K行。

要提高此限制，您必须更改idea.properties文件并添加密钥idea.cycle.buffer.size并相应地进行调整。

请参阅详细解决方案的this bug report。

为什么Pycharm打印比写入文件少？

2 个答案: