In Python, matching the first occurrence of a word AFTER the occurrence of another word

时间:2016-10-20 18:58:28

标签: python regex

I am attempting to replace the occurrence of a word that occurs only after another word in a JSON text string. I have been struggling to use regular expressions to do this but using just Python functions will be fine with me.

So what I want is to find the first occurrence of "LEVEL1": (with quotes), then find first occurrence of "session_transition":, then find whatever string is in quotes after "session_transition":, and then replace it with another string. Here is the string I am working with:

"BASELINE": {
    "audio_volume": 150,
    "cry_threshold": 70,
    "cry_transition": "LEVEL1",
    "expected_volume": 63,
    "led_color": "BLUE",
    "led_blink_speed": "NONE",
    "motor_amplitude": 0.97,
    "motor_frequency": 0.5,
    "power_transition": "SUSPENDED",
    "seconds_to_ignore_cry": 10.0,
    "seconds_in_state": -1.0,
    "session_transition": "ONLINE",
    "track": "RoR",
    "timer_transition": null,
    "active_session" : 1
},
"LEVEL1": {
    "audio_volume": 300,
    "cry_threshold": 75,
    "expected_volume": 63,
    "cry_transition": "LEVEL2",
    "led_color": "PURPLE",
    "led_blink_speed": "NONE",
    "motor_amplitude": 0.76,
    "motor_frequency": 1.20,
    "power_transition": "SUSPENDED",
    "seconds_to_ignore_cry": 10.0,
    "seconds_in_state": 480.0,
    "session_transition": "ONLINE",
    "track": "RoR",
    "timer_transition": "BASELINE",
    "active_session" : 1
}

}

For instance, below I want to find and replace "ONLINE" under "LEVEL1": --> "session_transition": to "OFFLINE" so it'll look like this:

"LEVEL1": {
    "audio_volume": 300,
    "cry_threshold": 75,
    "expected_volume": 63,
    "cry_transition": "LEVEL2",
    "led_color": "PURPLE",
    "led_blink_speed": "NONE",
    "motor_amplitude": 0.76,
    "motor_frequency": 1.20,
    "power_transition": "SUSPENDED",
    "seconds_to_ignore_cry": 10.0,
    "seconds_in_state": 480.0,
    "session_transition": "OFFLINE",
    "track": "RoR",
    "timer_transition": "BASELINE",
    "active_session" : 1
}

So far I have r"(?<=\"LEVEL1\"\:).* to match the first occurrence but don't know how to proceed further.

3 个答案:

答案 0 :(得分:0)

I think you can do this somewhat easily using string.index()

first_index = some_string.index('"Level1"')
second_index = some_string[first_index:].index('"Online"')

after that I leave it up to you to replace the string. You should be able to do it using some_string[second_index:].split('"') and then use splicing and join to put it back together.

答案 1 :(得分:0)

I would suggest you to use the JSON library which is inbuilt in Python. You can easily convert the JSON to a Python Dict Object. This would prevent you from complex regex as well. It's better to reduce complexity in readability as well. Documentation: Python 3.4 Python 2.7

In Python 2

import json
jsonDict = json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')
print(jsonDict)

Result:

['foo', {'bar': ['baz', None, 1.0, 2]}]

You can then easily do manipulations as you would do to a Python Dictionary. Seems more intuitive this way.

Once you are done you can convert it using

jsonStr = json.dumps(jsonDict)
print(jsonStr)

Result:

["foo", {"bar":["baz", null, 1.0, 2]}]

答案 2 :(得分:0)

好的,所以我能够通过以下方式解决这个问题:

#1)Find the unique string "LEVEL1": and save its index
after_index = configuration_in_text_format.index('"LEVEL1":')
#2)Starting from previous index, find the "session_transition": string and save its index
after_index = configuration_in_text_format.find('"session_transition":', after_index)
#3)Create a new string of bottom part with "session_transition": as first line
new_config_in_text_format = configuration_in_text_format[after_index:]
#4)Remove the first line with "session_transition": in it
new_config_in_text_format = new_config_in_text_format[new_config_in_text_format.find('\n')+1:]
#5)Create a new string to replace the deleted new line
new_line_str = '"session_transition": "OFFLINE",\n'
#6)Put new string on top of remaining text, effectively replacing old line
new_config_in_text_format = new_line_str + new_config_in_text_format
#7)Take the top part of original text and append the newly modified bottom part
new_config_in_text_format = configuration_in_text_format[:after_index] + new_config_in_text_format
print new_config_in_text_format

正确的输出:

  "BASELINE": {
    "audio_volume": 150,
    "cry_threshold": 70,
    "cry_transition": "LEVEL1",
    "expected_volume": 63,
    "led_color": "BLUE",
    "led_blink_speed": "NONE",
    "motor_amplitude": 0.97,
    "motor_frequency": 0.5,
    "power_transition": "SUSPENDED",
    "seconds_to_ignore_cry": 10.0,
    "seconds_in_state": -1.0,
    "session_transition": "ONLINE",
    "track": "RoR",
    "timer_transition": null,
    "active_session" : 1
},
"LEVEL1": {
    "audio_volume": 300,
    "cry_threshold": 75,
    "expected_volume": 63,
    "cry_transition": "LEVEL2",
    "led_color": "PURPLE",
    "led_blink_speed": "NONE",
    "motor_amplitude": 0.76,
    "motor_frequency": 1.20,
    "power_transition": "SUSPENDED",
    "seconds_to_ignore_cry": 10.0,
    "seconds_in_state": 480.0,
    "session_transition": "OFFLINE",
    "track": "RoR",
    "timer_transition": "BASELINE",
    "active_session" : 1
},