我需要替换部分查询(字符串)不总是要替换相同的子字符串。
query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value
from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by YEAR(utimestamp), MONTH(utimestamp), id """
我想在分组后用替换有关日期的部分。
这部分可以是以下任何字符串:
'YEAR(utimestamp), MONTH(utimestamp), DAY(utimestamp),'
'YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp),'
'YEAR(utimestamp), MONTH(utimestamp),'
'YEAR(utimestamp),'
我的想法是搜索" (utimestamp),"从左边(YEAR,DAY,WEEK或MONTH)搜索左边的第一个空格。删除后我想插入另一个子字符串,但是如果我有新的子字符串应该去的空格,我怎么能插入这个子字符串。
我想每次删除一个字符串时都会获取索引,并且一旦删除就不再删除子字符串,但我认为这会让事情变得复杂。
这样做是否更容易,更简洁?我错过了什么吗?
示例:
需要替换的输入字符串:
query =""" SELECT DATE(utimestamp)作为utimestamp,sum(value)作为值 从表 其中utimestamp BETWEEN' 2000-06-28 00:00:00' AND' 2000-07-05 00:00:00' 按年份(utimestamp),月份(utimestamp),id"""
或
query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value
from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by YEAR(utimestamp), id """
或
query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value
from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp), id """
等
期望的结果:
query_replaced = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value
from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by MY_COOL_STRING, id """
如果应该适用于所有这些案件(以及更多,之前陈述的案件)
按照@Efferalgan的回答,我提出了这个问题:
query_1 = query.split("group by")[0]
utimestamp_list = query.split("(utimestamp)")
l = len(utimestamp_list)
query_2 = utimestamp_list[l-1]
query_3 = query_1 + " group by MY_COOL_STRING" + query_2
答案 0 :(得分:0)
您可以使用正则表达式的re.sub()
来实现它:
>>> import re
>>> replace_with = 'HELLO'
>>> new_string = re.sub('group by\s\w+\(utimestamp\)', "group_by"+replace_with, query)
# Value of new_string: SELECT as utimestamp, sum(value) as value
# from table
# where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
# group by HELLO, HELLO, id
其中replace_with
是您需要使用模式'\w+\(utimestamp\)'
更新的内容,而query
是您在代码中提到的字符串。
此处,\w+
表示出现一个或多个字母的字母,而\(utimestamp\)
表示字母(utimestamp)
。
修改强>:
正如评论中所提到的,要替换timestamp
中query
的所有实例,正则表达式应该是这样的:
re.sub('group by\s\w+\(utimestamp\)(,\s*\w+\(utimestamp\))*', "group_by" + replace_with, query)
# Returned Value:
# SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
# where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
# group by HELLO, id
答案 1 :(得分:0)
根据你的要求,我会选择
query = query.split("group by")[0] + " group by MY_COOL_STRING" + query.split("(utimestamp)")[-1]
它在group by
之前连接部分,然后在MY_COOL_STRING
之前连接部分,然后在第一个(utimestamp)
之前连接第一个部分。
答案 2 :(得分:0)
如果我没有弄错的话,你不想摆脱(utimestamp)
部分,只有YEAR
,MONTH
等等。或许我弄错了但是在这种情况下,这个解决方案很容易适应:只需调整rep
字典来满足您的需求。
无论如何,我会使用正则表达式。这应该在一次通过和(相当)简单的方式照顾你想要的(我认为)。
import re
rep = {
'YEAR': 'y',
'MONTH': 'm',
'WEEK': 'w',
'DAY': 'd',
}
query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by YEAR(utimestamp), MONTH(utimestamp), id """
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], query)
print("Processed query: {}\n".format(replaced))
这只是基本的例子。这是一个更完整的评论,解释了代码的作用,包括最后针对您提到的所有可能模式的测试:
import re
# Several possible patterns like you mentioned.
# Only used for testing further down.
patterns = [
'YEAR(utimestamp), MONTH(utimestamp), DAY(utimestamp)',
'YEAR(utimestamp), MONTH(utimestamp), WEEK(utimestamp)',
'YEAR(utimestamp), MONTH(utimestamp)',
'YEAR(utimestamp)'
]
# These are the several patterns to be matched and their replacements.
# The keys are the patterns to match and the values are what you want
# to replace them with.
rep = {
'YEAR': 'y',
'MONTH': 'm',
'WEEK': 'w',
'DAY': 'd',
}
# The query string template, where we'll replace {} with each of the patterns.
query = """ SELECT DATE(utimestamp) as utimestamp, sum(value) as value from table
where utimestamp BETWEEN '2000-06-28 00:00:00' AND '2000-07-05 00:00:00'
group by {}, id """
# A dictionary with escaped patterns (the keys) suitable for use in regex.
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
# We join each possible pattern (the keys in the rep dict) with | so that the
# regex engine considers them all when matching, i.e., "hey, regex engine,
# please match YEAR or MONTH or WEEK or DAY". This builds the matching patter
# we'll use and we also pre-compile the regex to make it faster.
pattern = re.compile("|".join(rep.keys()))
# This is the trick part: we're using pattern.sub() to replace our pattern from
# above with what we want (the values in the rep dict). We're telling the regex
# engine to call a function for each occurrence of the pattern in order to get
# the value we're replacing it with. In our case, we want to get the value from
# the rep dict, using the key which is the found match. m is the match object,
# m.group(0) is the first match, re.escape() escapes the value and we finally
# use this as the key to fetch the value from the rep dict.
q = query.format(patterns[0])
print("Query: {}\n".format(q))
replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], q)
print("Processed query: {}\n".format(replaced))
# Now to test it with the examples you gave let's iterate over the patterns
# dict, form a new query string using each of them and run the regex against
# each one.
print("###########################")
print("Test each pattern:\n")
print("---------------------------")
for p in patterns:
q = query.format(p)
print("Pattern: {}".format(p))
print("Original query: {}\n".format(q))
replaced = pattern.sub(lambda m: rep[re.escape(m.group(0))], q)
print("Processed query: {}\n".format(replaced))
print("---------------------------\n")
您可以详细了解re.sub()
的工作原理。