我是Python的初学者。我有一个包含一行数据的文件。我的要求是仅在某些单词首次出现后提取“ n”个字符。而且,这些单词不是连续的。
数据文件:{"id":"1234566jnejnwfw","displayId":"1234566jne","author":{"name":"abcd@xyz.com","datetime":15636378484,"displayId":"23423426jne","datetime":4353453453}
我想在“ displayId”的第一个匹配项之后和“ author”之前获取值,即1234566jne。对于“ datetime”也是如此。
我尝试根据索引作为单词将行换行,并将其放入另一个文件中以进行进一步清理以获得准确的值。
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open("data file") as openfile:
for line in openfile:
tmpFileOpen.write(line[line.index(displayId) + len(displayId):])
但是,我确信这不是进一步工作的好方法。
有人可以帮我吗?
答案 0 :(得分:1)
此答案适用于任何格式与您的问题类似的displayId。我决定不为该答案加载JSON文件,因为不需要它来完成任务。
import re
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open('data_file.txt', 'r') as input:
lines = input.read()
# Use regex to find the displayId element
# example: "displayId":"1234566jne
# \W matches none words, such as " and :
# \d matches digits
# {6,8} matches digits lengths between 6 and 8
# [a-z] matches lowercased ASCII characters
# {3} matches 3 lowercased ASCII characters
id_patterns = re.compile(r'\WdisplayId\W{3}\d{6,8}[a-z]{3}')
id_results = re.findall(id_patterns, lines)
# Use list comprehension to clean the results
clean_results = ([s.strip('"displayId":"') for s in id_results])
# loop through clean_results list
for id in clean_results:
# Write id to temp file on separate lines
tmpFileOpen.write('{} \n'.format(id))
# output in tmpFileOpen
# 1234566jne
# 23423426jne
此答案确实会加载JSON文件,但是如果JSON文件格式更改,此答案将失败。
import json
tmpFile = 'tmpFile.txt'
tmpFileOpen = open(tmpFile, "w+")
# Load the JSON file
jdata = json.loads(open('data_file.txt').read())
# Find the first ID
first_id = (jdata['displayId'])
# Write the first ID to the temp file
tmpFileOpen.write('{} \n'.format(first_id))
# Find the second ID
second_id = (jdata['author']['displayId'])
# Write the second ID to the temp file
tmpFileOpen.write('{} \n'.format(second_id))
# output in tmpFileOpen
# 1234566jne
# 23423426jne
答案 1 :(得分:0)
如果我正确理解了您的问题,则可以执行以下操作:
import json
tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")
with open("data.txt") as openfile:
for line in openfile:
// Loads the json to a dict in order to manipulate it easily
data = json.loads(str(line))
// Here I specify that I want to write to my tmp File only the first 3
// characters of the field `displayId`
tmpFileOpen.write(data['displayId'][:3])
之所以可以这样做,是因为文件中的数据是JSON,但是如果格式更改,它将无法正常工作