所以基本上我的脚本读取并解析了一个JSON文件。
JSON文件:
{
"messages":
[
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
我有一个python脚本,可去除“ agentText”,并且for循环逐行打印出每个对象
import json
with open('20190626-101200-text-messages.json') as f:
data = json.load(f)
for message in data['messages']:
splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
if len(splittext)>0:
print(splittext)
这给了我
That customer was great
That customer was stupid I hope they don't phone back
Line number 3
我需要将这些单独的行附加在一起,以便读取:
That customer was great That customer was stupid I hope they don't phone back Line number 3
因此,我可以对其应用一些停用词/ nltk。该怎么办?
答案 0 :(得分:2)
您可以将所有行连接为一个字符串变量:
res = ""
for message in data['messages']:
splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
if len(splittext)>0:
res += splittext + " "
或者在列表的帮助下使用字符串方法:
res = []
for message in data['messages']:
splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
if len(splittext)>0:
res.append(splittext)
print(" ".join(res))
答案 1 :(得分:1)
使用对str.join
和str.splitlines
的理解
例如:
data = {
"messages":
[
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
{"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
print(" ".join(j for msg in data["messages"] for j in msg["agentText"].splitlines()))
输出:
That customer was great That customer was stupid I hope they don't phone back Line number 3