在python中替换不带有清晰模式的不同子串

时间:2016-10-04 12:17:52

标签: python string replace

我需要替换部分查询(字符串)总是要替换相同的子字符串。

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
from table 
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
group by YEAR(utimestamp), MONTH(utimestamp), id """

我想在分组后用替换有关日期的部分。

这部分可以是以下任何字符串:

'YEAR(utimestamp), MONTH(utimestamp), DAY(utimestamp),'
'YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp),'
'YEAR(utimestamp), MONTH(utimestamp),'
'YEAR(utimestamp),'

我的想法是搜索" (utimestamp),"从左边(YEAR,DAY,WEEK或MONTH)搜索左边的第一个空格。删除后我想插入另一个子字符串,但是如果我有新的子字符串应该去的空格,我怎么能插入这个子字符串。

我想每次删除一个字符串时都会获取索引,并且一旦删除就不再删除子字符串,但我认为这会让事情变得复杂。

这样做是否更容易,更简洁?我错过了什么吗?

示例:

需要替换的输入字符串:

query =""" SELECT DATE(utimestamp)作为utimestamp,sum(value)作为值     从表     其中utimestamp BETWEEN' 2000-06-28 00:00:00' AND' 2000-07-05 00:00:00'     按年份(utimestamp),月份(utimestamp),id"""

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
        from table 
        where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
        group by YEAR(utimestamp), id """

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
        from table 
        where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
        group by YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp), id """

期望的结果:

query_replaced = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
    from table 
    where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
    group by MY_COOL_STRING, id """

如果应该适用于所有这些案件(以及更多,之前陈述的案件)

按照@Efferalgan的回答,我提出了这个问题:

query_1 = query.split("group by")[0]
utimestamp_list = query.split("(utimestamp)")
l = len(utimestamp_list)
query_2 = utimestamp_list[l-1]
query_3 = query_1 + " group by MY_COOL_STRING" + query_2

3 个答案:

答案 0 :(得分:0)

您可以使用正则表达式的re.sub()来实现它:

>>> import re
>>> replace_with = 'HELLO'
>>> new_string  = re.sub('group by\s\w+\(utimestamp\)', "group_by"+replace_with, query)

# Value of new_string: SELECT  as utimestamp, sum(value) as value 
# from table 
# where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
# group by HELLO, HELLO, id

其中replace_with是您需要使用模式'\w+\(utimestamp\)'更新的内容,而query是您在代码中提到的字符串。

此处,\w+表示出现一个或多个字母的字母,而\(utimestamp\)表示字母(utimestamp)

修改

正如评论中所提到的,要替换timestampquery的所有实例,正则表达式应该是这样的:

re.sub('group by\s\w+\(utimestamp\)(,\s*\w+\(utimestamp\))*', "group_by" + replace_with, query)

# Returned Value:  
# SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
# where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
# group by HELLO, id

答案 1 :(得分:0)

根据你的要求,我会选择

query = query.split("group by")[0] + " group by MY_COOL_STRING" + query.split("(utimestamp)")[-1]

它在group by之前连接部分,然后在MY_COOL_STRING之前连接部分,然后在第一个(utimestamp)之前连接第一个部分。

答案 2 :(得分:0)

如果我没有弄错的话,你不想摆脱(utimestamp)部分,只有YEARMONTH等等。或许我弄错了但是在这种情况下,这个解决方案很容易适应:只需调整rep字典来满足您的需求。

无论如何,我会使用正则表达式。这应该在一次通过和(相当)简单的方式照顾你想要的(我认为)。

import re

rep = {
    'YEAR': 'y',
    'MONTH': 'm',
    'WEEK': 'w',
    'DAY': 'd',
}

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by YEAR(utimestamp), MONTH(utimestamp), id """

rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], query)

print("Processed query: {}\n".format(replaced))

这只是基本的例子。这是一个更完整的评论,解释了代码的作用,包括最后针对您提到的所有可能模式的测试:

import re

# Several possible patterns like you mentioned.
# Only used for testing further down.
patterns = [
    'YEAR(utimestamp), MONTH(utimestamp), DAY(utimestamp)',
    'YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp)',
    'YEAR(utimestamp), MONTH(utimestamp)',
    'YEAR(utimestamp)'
]

# These are the several patterns to be matched and their replacements.
# The keys are the patterns to match and the values are what you want
# to replace them with.
rep = {
    'YEAR': 'y',
    'MONTH': 'm',
    'WEEK': 'w',
    'DAY': 'd',
}

# The query string template, where we'll replace {} with each of the patterns.
query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by {}, id """

# A dictionary with escaped patterns (the keys) suitable for use in regex.
rep = dict((re.escape(k), v) for k, v in rep.iteritems())

# We join each possible pattern (the keys in the rep dict) with | so that the
# regex engine considers them all when matching, i.e., "hey, regex engine,
# please match YEAR or MONTH or WEEK or DAY". This builds the matching patter
# we'll use and we also pre-compile the regex to make it faster.
pattern = re.compile("|".join(rep.keys()))

# This is the trick part: we're using pattern.sub() to replace our pattern from
# above with what we want (the values in the rep dict). We're telling the regex
# engine to call a function for each occurrence of the pattern in order to get
# the value we're replacing it with. In our case, we want to get the value from
# the rep dict, using the key which is the found match. m is the match object,
# m.group(0) is the first match, re.escape() escapes the value and we finally
# use this as the key to fetch the value from the rep dict.
q = query.format(patterns[0])
print("Query: {}\n".format(q))
replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], q)
print("Processed query: {}\n".format(replaced))

# Now to test it with the examples you gave let's iterate over the patterns
# dict, form a new query string using each of them and run the regex against
# each one.
print("###########################")
print("Test each pattern:\n")
print("---------------------------")
for p in patterns:
    q = query.format(p)
    print("Pattern: {}".format(p))
    print("Original query: {}\n".format(q))

    replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], q)
    print("Processed query: {}\n".format(replaced))
    print("---------------------------\n")

您可以详细了解re.sub()的工作原理。