Question

更具体地说，我通过Google Takeout在谷歌视频群聊中下载了我的所有消息，但很多数据对我来说都是无用的。我唯一关心的是实际的消息，甚至不是时间戳。其中的每条消息都在.json文件中有一条单独的行，看起来像

"text" : "[actual message in here, including the brackets]"

那么我如何提取永远的消息，并且最好按时间顺序将它们全部分开？（它们都已按顺序排列，.json文件的顶部是最新消息，底部是最旧的消息）也许有人可以下载自己的Google Takeout文件以进行环聊以尝试执行此操作。任何帮助，将不胜感激。 Python可能最适合这项任务，但任何完成工作的编程语言都足够了。

Answer 1

使用python实现此目的的一种方法是将json文件加载到字典数据结构中，然后打印出所需的值。

你没有指定json的确切结构，所以如果json是一个由带有'text'键的对象组成的数组，那么这就可以完成工作（根据json结构改变它）：

import json

hangout_data = open('hangout_data') #Load the json file into a variable as text.
hangout_dict = json.loads(hangout_data) #Convert the json text to a dictionary.

for key, value in hangout_dict.iteritems(): #Go over the dictionary
    print(value['text'][1:-1]) #print the text property of each object in the array. [1:-1] strips the brackets.

希望这会有所帮助。非常欢迎您发布确切的结构，我将提供更具体的答案。

Answer 2

如果您想将事物视为纯文本：

file = open('filepath', 'r')
for line in file:
    strippedline=line.lstrip().rstrip() #lstrip removes leading white space, rstrip removes trailing '\n' (and other white space)
    if strippedline.startswith('"text" :'):
        message = ':'.join(strippedline.split(':')[1:])
        print message

可能最好只浏览本机json关键字命令。

这是一个输入文件：

"text" : "[actual message in here, including the brackets]"
"text" : "[actual message in here, including the brackets]"
"text" : "[actual message in here, including : the brackets and some ':' ]"
"texat" : "[This isn't a legal message]"
   "text" : "[actual message in here, including the brackets.  Note leading white space ]"

和输出：

"[actual message in here, including the brackets]"
"[actual message in here, including the brackets]"
"[actual message in here, including : the brackets and some ':' ]"
"[actual message in here, including the brackets.  Note leading white space ]"

我有一个文本文档，我希望在特定关键字之后复制所有内容。我该怎么做？

2 个答案: