Question

我需要替换部分查询（字符串）不总是要替换相同的子字符串。

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
from table 
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
group by YEAR(utimestamp), MONTH(utimestamp), id """

我想在分组后用替换有关日期的部分。

这部分可以是以下任何字符串：

'YEAR(utimestamp), MONTH(utimestamp), DAY(utimestamp),'
'YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp),'
'YEAR(utimestamp), MONTH(utimestamp),'
'YEAR(utimestamp),'

我的想法是搜索＆＃34; （utimestamp），＆＃34;从左边（YEAR，DAY，WEEK或MONTH）搜索左边的第一个空格。删除后我想插入另一个子字符串，但是如果我有新的子字符串应该去的空格，我怎么能插入这个子字符串。

我想每次删除一个字符串时都会获取索引，并且一旦删除就不再删除子字符串，但我认为这会让事情变得复杂。

这样做是否更容易，更简洁？我错过了什么吗？

示例：

需要替换的输入字符串：

query =＆＃34;＆＃34;＆＃34; SELECT DATE（utimestamp）作为utimestamp，sum（value）作为值从表其中utimestamp BETWEEN＆＃39; 2000-06-28 00：00：00＆＃39; AND＆＃39; 2000-07-05 00：00：00＆＃39; 按年份（utimestamp），月份（utimestamp），id＆＃34;＆＃34;＆＃34;

或

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
        from table 
        where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
        group by YEAR(utimestamp), id """

或

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
        from table 
        where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
        group by YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp), id """

等

期望的结果：

query_replaced = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value 
    from table 
    where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
    group by MY_COOL_STRING, id """

如果应该适用于所有这些案件（以及更多，之前陈述的案件）

按照@Efferalgan的回答，我提出了这个问题：

query_1 = query.split("group by")[0]
utimestamp_list = query.split("(utimestamp)")
l = len(utimestamp_list)
query_2 = utimestamp_list[l-1]
query_3 = query_1 + " group by MY_COOL_STRING" + query_2

Answer 1

您可以使用正则表达式的re.sub()来实现它：

>>> import re
>>> replace_with = 'HELLO'
>>> new_string  = re.sub('group by\s\w+\(utimestamp\)', "group_by"+replace_with, query)

# Value of new_string: SELECT  as utimestamp, sum(value) as value 
# from table 
# where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' 
# group by HELLO, HELLO, id

其中replace_with是您需要使用模式'\w+\(utimestamp\)'更新的内容，而query是您在代码中提到的字符串。

此处，\w+表示出现一个或多个字母的字母，而\(utimestamp\)表示字母(utimestamp)。

修改：

正如评论中所提到的，要替换timestamp中query的所有实例，正则表达式应该是这样的：

re.sub('group by\s\w+\(utimestamp\)(,\s*\w+\(utimestamp\))*', "group_by" + replace_with, query) # Returned Value: # SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table # where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00' # group by HELLO, id

Answer 2

根据你的要求，我会选择

query = query.split("group by")[0] + " group by MY_COOL_STRING" + query.split("(utimestamp)")[-1]

它在group by之前连接部分，然后在MY_COOL_STRING之前连接部分，然后在第一个(utimestamp)之前连接第一个部分。

Answer 3

如果我没有弄错的话，你不想摆脱(utimestamp)部分，只有YEAR，MONTH等等。或许我弄错了但是在这种情况下，这个解决方案很容易适应：只需调整rep字典来满足您的需求。

无论如何，我会使用正则表达式。这应该在一次通过和（相当）简单的方式照顾你想要的（我认为）。

import re

rep = {
    'YEAR': 'y',
    'MONTH': 'm',
    'WEEK': 'w',
    'DAY': 'd',
}

query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by YEAR(utimestamp), MONTH(utimestamp), id """

rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], query)

print("Processed query: {}\n".format(replaced))

这只是基本的例子。这是一个更完整的评论，解释了代码的作用，包括最后针对您提到的所有可能模式的测试：

import re

# Several possible patterns like you mentioned.
# Only used for testing further down.
patterns = [
    'YEAR(utimestamp), MONTH(utimestamp), DAY(utimestamp)',
    'YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp)',
    'YEAR(utimestamp), MONTH(utimestamp)',
    'YEAR(utimestamp)'
]

# These are the several patterns to be matched and their replacements.
# The keys are the patterns to match and the values are what you want
# to replace them with.
rep = {
    'YEAR': 'y',
    'MONTH': 'm',
    'WEEK': 'w',
    'DAY': 'd',
}

# The query string template, where we'll replace {} with each of the patterns.
query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by {}, id """

# A dictionary with escaped patterns (the keys) suitable for use in regex.
rep = dict((re.escape(k), v) for k, v in rep.iteritems())

# We join each possible pattern (the keys in the rep dict) with | so that the
# regex engine considers them all when matching, i.e., "hey, regex engine,
# please match YEAR or MONTH or WEEK or DAY". This builds the matching patter
# we'll use and we also pre-compile the regex to make it faster.
pattern = re.compile("|".join(rep.keys()))

# This is the trick part: we're using pattern.sub() to replace our pattern from
# above with what we want (the values in the rep dict). We're telling the regex
# engine to call a function for each occurrence of the pattern in order to get
# the value we're replacing it with. In our case, we want to get the value from
# the rep dict, using the key which is the found match. m is the match object,
# m.group(0) is the first match, re.escape() escapes the value and we finally
# use this as the key to fetch the value from the rep dict.
q = query.format(patterns[0])
print("Query: {}\n".format(q))
replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], q)
print("Processed query: {}\n".format(replaced))

# Now to test it with the examples you gave let's iterate over the patterns
# dict, form a new query string using each of them and run the regex against
# each one.
print("###########################")
print("Test each pattern:\n")
print("---------------------------")
for p in patterns:
    q = query.format(p)
    print("Pattern: {}".format(p))
    print("Original query: {}\n".format(q))

    replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], q)
    print("Processed query: {}\n".format(replaced))
    print("---------------------------\n")

您可以详细了解re.sub()的工作原理。

在python中替换不带有清晰模式的不同子串

3 个答案: