Question

我有一个非常大的txt文件来读取和处理它，但由于我是Python的新手，我不知道文件的格式是什么，我怎么能读取它。下面是一个示例：

[
    {"content": "111111", "n": "ITEM 1", "a": "ANOTHER", "t": 1},
    {"content": "222222", "n": "ITEM 2", "a": "ANOTHER", "t": 1},
    {"content": "333333", "n": "ITEM 3", "a": "ANOTHER", "t": 1}
]

所以，我需要循环列表'[]'中的每个项目（我认为我做了什么），然后，每个项目如“content”，“n”，“a”，“t”。

我尝试读取文件并采取如下循环：

for item in thecontent:
    data = json.load(item)

pprint(data)

我想我把上面的循环中的每个'item'都作为一个字符串，而不是json。

编辑2 我认为我需要使用ujson数据类型，因为我在文档中获得的示例在上面是相同的。如果您想更好地了解，请转到documentation page

>>> import ujson
>>> ujson.dumps([{"key": "value"}, 81, True])
'[{"key":"value"},81,true]'
>>> ujson.loads("""[{"key": "value"}, 81, true]""")
[{u'key': u'value'}, 81, True]

谢谢大家！

编辑3： 我一直在寻找关于我遇到的问题的任何答案，并且发现问题不是关于'如何阅读'列表或元组，因为我是通过文件做到的。

主要问题是如何在从网络获取内容时将字节转换为字符串，并在this topic中解决，更具体地说是this reply。

我编写的用于获取webcontent并将其转换为json的代码是：

def get_json_by_url(url):
    r = requests.get(url)
    r.raise_for_status()
    return json.loads(r.content.decode('utf-8'))

因此，对于正在寻找此问题的人来说，这可能是一个解决方案，我已经将标题从'如何阅读python中的元组列表（或json）更改为'如何从Web获取内容从字节转换为str / json'这是我遇到的问题。

我很抱歉不能很好地解释这个问题，所以当我是Python的新手时，有时需要花很多时间来诊断问题本身。

全部谢谢！

Answer 1

这两个解决方案都适用于我，并假设该文件采用上述示例中的格式。这取决于您在从文件加载数据后要对此数据执行的操作，但是（您没有指定此数据）。

首先，简单/快速版本以一个列表中的所有数据结尾（字典列表）：

import json

with open("myFile.txt", "r") as f:
    data = json.load(f)  #  load the entire file content into one list of many dictionaries

#  process data here as desired possibly in a loop if you like
print data

输出：

[{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}, {u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}, {u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}]

对于非常大的文件，或者如果您不希望所有数据都在一个列表中：

import json

with open("myFile.txt", "r") as f:
    for line in f:                       #  for each line in the file
        line = line.strip(", ][\n")      #  strip off any leading and trailing commas, spaces, square brackets and newlines
        if len(line):                    #  if there is anything left in the line it should look like "{ key: value... }"
            try:
                data = json.loads(line)  #  load the line into a single dictionary
                #  process a single item (dictionary) of data here in whatever way you like
                print data
            except:
                print "invalid json:  " + line

输出：

{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}
{u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}
{u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}

对于大多数情况，第一个选项应该没问题，即使对于相当大的文件也是如此。

如何从Web获取内容并从字节转换为str / json

1 个答案: