Question

我已经按照Gödel, Escher, Bach提供的图表，构建了一个Python脚本，使用来自Princeton English Wordnet的数据随机创建句子。调用python GEB.py会产生一系列英语无意义的句子，例如：

恢复麻醉的费用。苔藓植物指甲。第四十号桃子。星空皮。经过重新包装的要求使翻译成a_d_d的长袍的面粉穿透了苹果树。金枪鱼旁边的一具小叶货船。

并将其保存到gibberish.txt。该脚本可以正常工作。

另一个脚本（translator.py）带有gibberish.txt，并通过py-googletrans Python模块尝试将这些随机句子翻译成葡萄牙语：

from googletrans import Translator
import json

tradutor = Translator()

with open('data.json') as dataFile:
    data = json.load(dataFile)


def buscaLocal(keyword):
    if keyword in data:
        print(keyword + data[keyword])
    else:
        buscaAPI(keyword)


def buscaAPI(keyword):
    result = tradutor.translate(keyword, dest="pt")
    data.update({keyword: result.text})

    with open('data.json', 'w') as fp:
        json.dump(data, fp)

    print(keyword + result.text)


keyword = open('/home/user/gibberish.txt', 'r').readline()
buscaLocal(keyword)

当前，第二个脚本仅输出gibberish.txt中第一句的翻译。像这样：

恢复麻醉的费用。 aumento de custosinestético。

我尝试使用readlines()代替readline()，但是出现以下错误：

Traceback (most recent call last):
  File "main.py", line 28, in <module>
    buscaLocal(keyword)
  File "main.py", line 11, in buscaLocal
    if keyword in data:
TypeError: unhashable type: 'list'

我在这里已经阅读了有关此错误的类似问题，但我不清楚我应该使用什么来读取gibberish.txt中包含的整个句子列表（新句子从新行开始）。 / p>

如何阅读gibberish.txt中包含的整个句子列表？我应该如何修改translator.py中的代码以实现这一目标？很抱歉，如果这个问题有点令人困惑，我可以根据需要进行编辑，我是Python新手，如果有人可以帮助我，我将不胜感激。

Answer 1

如果使用readline()函数，则必须记住该函数仅返回一行，因此必须使用循环来遍历文本文件中的所有行。在使用readlines()的情况下，此函数确实一次读取完整文件，但返回列表中的每一行。列表数据类型是不可散列的，不能用作dict对象中的键，这就是if keyword in data:行发出此错误的原因，因为keyword这里是所有行的列表。一个简单的for循环将解决此问题。

text_lines = open('/home/user/gibberish.txt', 'r').readlines()
for line in text_lines:
     buscaLocal(line)

此循环将遍历列表中的所有行，由于键元素将是字符串，因此访问dict时将出错。

Answer 2

让我们从您对文件对象所做的事情开始。您打开一个文件，从中获得一行，然后再关闭它。更好的方法是处理整个文件，然后将其关闭。通常使用with块来完成，即使发生错误，该块也会关闭文件：

with open('gibberish.txt') as f:
    # do stuff to f

除了物质上的好处外，这将使界面更清晰，因为f不再是一个可丢弃的对象。您可以通过三个简单的选项来处理整个文件：

在循环中使用readline，因为它一次只能读取一行。您将必须手动剥离换行符，并在出现''时终止循环：
```
while True:
    line = f.readline()
    if not line: break
    keyword = line.rstrip()
    buscaLocal(keyword)
```
此循环可以采用多种形式，此处显示其中一种形式。
使用readlines一次将文件中的所有行读入字符串列表：
```
for line in f.readlines():
    keyword = line.rstrip()
    buscaLocal(keyword)
```
这比以前的选项要干净得多，因为您不需要手动检查循环终止，但是它具有一次加载整个文件的缺点，而readline循环则不会。

这将我们带到第三个选项。
Python文件是可迭代的对象。您可以节省readlines，从而保持readline方法的整洁：
```
for line in f:
     buscaLocal(line.rstrip())
```
可以使用readline和next的更神秘形式来模拟这种方法，以创建类似的迭代器：
```
for line in next(f.readline, ''):
     buscaLocal(line.rstrip())
```

作为补充，我将对您的功能进行一些修改：

def buscaLocal(keyword):
    if keyword not in data:
        buscaAPI(keyword)
    print(keyword + data[keyword])

def buscaAPI(keyword):
    # Make your function do one thing. In this case, do a lookup.
    # Printing is not the task for this function.
    result = tradutor.translate(keyword, dest="pt")
    # No need to do a complicated update with a whole new
    # dict object when you can do a simple assignment.
    data[keyword] = result.text

...

# Avoid rewriting the file every time you get a new word.
# Do it once at the very end.
with open('data.json', 'w') as fp:
    json.dump(data, fp)

Python：如何正确使用readline（）和readlines（）

2 个答案: