Question

import json
with open('data.json') as f:
    data=json.load(f)
lis= [row['text'] for row in data['City']['values'] if row['text'].startswith("N")]
lis=sorted(lis)
print lis[:5]

和json数据是这样的：

{"City": {"values": [{"text": "Abee|Alberta|Canada", "state": "AB", "id": 21774}, {"text": "Acadia Valley|Alberta|Canada", "state": "AB", "id": 21775}, {"text": "Acme|Alberta|Canada", "state": "AB", "id": 21776}, {"text": "Airdrie|Alberta|Canada", "state": "AB", "id": 21777}, {"text": "Alderson|Alberta|Canada", "state": "AB", "id": 21778}, {"text": "Alix|Alberta|Canada", "state": "AB", "id": 21779}, {"text": "Alliance|Alberta|Canada", "state": "AB", "id": 21780}, {"text": "Andrew|Alberta|Canada", "state": "AB", "id": 21781}, {"text": "Ardmore|Alberta|Canada", "state": "AB", "id": 21782}, {"text": "Ardrossan|Alberta|Canada", "state": "AB", "id": 21783}, {"text": "Ashmont|Alberta|Canada", "state": "AB", "id": 21784}, {"text": "Athabasca|Alberta|Canada", "state": "AB", "id": 21785}, {"text": "Atikameg|Alberta|Canada", "state": "AB", "id": 21786}, {"text": "Atmore|Alberta|Canada", "state": "AB", "id": 21787}, {"text": "Avenir|Alberta|Canada", "state": "AB", "id": 21788}, {"text": "Balzac|Alberta|Canada", "state": "AB", "id": 21789}, {"text": "Banff|Alberta|Canada", "state": "AB", "id": 21790}, {"text": "Barons|Alberta|Canada", "state": "AB", "id": 21791}, {"text": "Barrhead|Alberta|Canada", "state": "AB", "id": 21792}, {"text": "Bashaw|Alberta|Canada", "state": "AB", "id": 21793}, {"text": "Bassano|Alberta|Canada", "state": "AB", "id": 21794}, {"text": "Beaumont|Alberta|Canada", "state": "AB", "id": 21795}, {"text": "Beaverlodge|Alberta|Canada", "state": "AB", "id": 21796}, {"text": "Beiseker|Alberta|Canada", "state": "AB", "id": 21797}, {"text": "Bellevue|Alberta|Canada", "state": "AB", "id": 21798}, {"text": "Bellis|Alberta|Canada", "state": "AB", "id": 21799}, {"text": "Benalto|Alberta|Canada", "state": "AB", "id": 21800}, {"text": "Bentley|Alberta|Canada", "state": "AB", "id": 21801}, {"text": "Bergen|Alberta|Canada", "state": "AB", "id": 21802}, {"text": "Berwyn|Alberta|Canada", "state": "AB", "id": 21803}, {"text": "Big Valley|Alberta|Canada", "state": "AB", "id": 21804}, {"text": "Bilby|Alberta|Canada", "state": "AB", "id": 21805}, {"text": "Bittern Lake|Alberta|Canada", "state": "AB", "id": 21806}, {"text": "Black Diamond|Alberta|Canada", "state": "AB", "id": 21807}, {"text": "Blackfalds|Alberta|Canada", "state": "AB", "id": 21808}, {"text": "Blackie|Alberta|Canada", "state": "AB", "id": 21809}, {"text": "Blairmore|Alberta|Canada", "state": "AB", "id": 21810}, {"text": "Blue Ridge|Alberta|Canada", "state": "AB", "id": 21811}, {"text": "Bluesky|Alberta|Canada", "state": "AB", "id": 21812}, {"text": "Bluffton|Alberta|Canada", "state": "AB", "id": 21813}, {"text": "Bon Accord|Alberta|Canada", "state": "AB", "id": 21814}, {"text": "Bonnyville|Alberta|Canada", "state": "AB", "id": 21815}, {"text": "Bowden|Alberta|Canada", "state": "AB", "id": 21816}, {"text": "Bow Island|Alberta|Canada", "state": "AB", "id": 21817}, {"text": "Boyle|Alberta|Canada", "state": "AB", "id": 21818}, {"text": "Brampton|Alberta|Canada", "state": "AB", "id": 21819}]}}

任何帮助都非常感谢！

Answer 1

实际上这是一个查询 - 过滤'N％'，排序，限制。

我真的要问自己，这是怎么运行的，我可以提前做些什么工作，以便它的时间敏感部分做尽可能少的工作？

在您的情况下，这很明显 - 该数据集是否会发生变化？如果不是每次运行，那么你应该将它预先分解到内存中（或者至少存储为不是json的东西）。一旦采用这种方法，就会有很多选项（例如使用sqlite和内存数据库）。

比较其他方法 - 让我们从至少加载的文件内容开始（所以我们没有分析磁盘io）。

with open('data.json') as f:
    data = f.read()

现在，你的方法（我们将删除打印位，因为在比较中没有太多的点分析）：

def original(data):
    data = json.loads(data)
    lis = [row['text'] for row in data['City']['values'] if row['text'].startswith("A")]
    lis = sorted(lis)
    return lis[:5]

另一个我们使用正则表达式直接使用文本的方法：

def with_regex(data):
    filtered = [x[9:-1] for x in re.findall('"text": "A[^"]+"', data)]
    return sorted(filtered)[:5]

现在进行比较：

%timeit original(data)
10000 loops, best of 3: 57.4 µs per loop

%timeit with_regex(data)
100000 loops, best of 3: 11.1 µs per loop

因此，在这种情况下使用正则表达式可以更快地（5次）完成这项工作 - 但数据需要格式化。

如果您对其进行了分析，您会看到您的版本在json解码器中花费了所有时间。最好的办法是让它消失（我会做一次）。

Python初学者：如何减少这个小程序的执行时间？

1 个答案: