正则表达式模式匹配插入sql查询

时间:2010-11-29 02:28:27

标签: php regex

我有一些日志文件包含许多类似的行:

[26-Nov-2010 07:33:08] query error: INSERT INTO members (id,name,member_login_key,email,mgroup,posts,joined,ip_address,time_offset,view_sigs,email_pm,view_img,view_avs,restrict_post,view_pop,msg_total,new_msg,coppa_user,language,dst_in_use,allow_admin_mails,hide_email,subs_pkg_chosen,members_l_username,members_l_display_name, item_id, members_display_name)
                                        VALUES(8416961,'abc','3857b123a1a67ce1fc4a39fd7ae47355','test@email.com',1,0,1290756788,'127.0.0.1','',1,1,1,1,
                    0,1,0,0,0,'',0,1,0,0,'abc','abc',
                                        '0', 'abc');|http://www.example.com/|Duplicate entry '8388607' for key 1
[26-Nov-2010 08:33:08] query error: INSERT INTO members (id,name,member_login_key,email,mgroup,posts,joined,ip_address,time_offset,view_sigs,email_pm,view_img,view_avs,restrict_post,view_pop,msg_total,new_msg,coppa_user,language,dst_in_use,allow_admin_mails,hide_email,subs_pkg_chosen,members_l_username,members_l_display_name, item_id, members_display_name)
                                        VALUES(8416962,'abc','3857b123a1a67ce1fc4a39fd7ae47355','test@email.com',1,0,1290756788,'127.0.0.1','',1,1,1,1,
                    0,1,0,0,0,'',0,1,0,0,'abc','abc',
                                        '0', 'abc');|http://www.example.com/|Duplicate entry '8388607' for key 1

我想要做的是运行一个正则表达式来匹配所有插入查询(忽略时间,网址和重复的消息。

所以它应该返回:

INSERT INTO members (id,name,member_login_key,email,mgroup,posts,joined,ip_address,time_offset,view_sigs,email_pm,view_img,view_avs,restrict_post,view_pop,msg_total,new_msg,coppa_user,language,dst_in_use,allow_admin_mails,hide_email,subs_pkg_chosen,members_l_username,members_l_display_name, item_id, members_display_name)
                                    VALUES(8416961,'abc','3857b123a1a67ce1fc4a39fd7ae47355','test@email.com',1,0,1290756788,'127.0.0.1','',1,1,1,1,
                0,1,0,0,0,'',0,1,0,0,'abc','abc',
                                    '0', 'abc');
INSERT INTO members (id,name,member_login_key,email,mgroup,posts,joined,ip_address,time_offset,view_sigs,email_pm,view_img,view_avs,restrict_post,view_pop,msg_total,new_msg,coppa_user,language,dst_in_use,allow_admin_mails,hide_email,subs_pkg_chosen,members_l_username,members_l_display_name, item_id, members_display_name)
                                    VALUES(8416962,'abc','3857b123a1a67ce1fc4a39fd7ae47355','test@email.com',1,0,1290756788,'127.0.0.1','',1,1,1,1,
                0,1,0,0,0,'',0,1,0,0,'abc','abc',
                                    '0', 'abc');

任何人都可以提供帮助?提前谢谢!

3 个答案:

答案 0 :(得分:0)

你想提取它的一部分,还是只匹配?

只是匹配很简单,它根本不需要正则表达式,只需要子串INSERT INTO。

grep 'INSERT INTO' foo.log

如果您想提取详细信息或进行更具体的匹配,请提供更多信息。

如果你想拥有以下三行,你可以这样做。

grep -A 3 'INSERT INTO' foo.log

如果你想从开始和结束中修剪一些东西(这很丑陋,但对你的例子有效)

grep -A 3 'INSERT INTO' foo.log | sed -e 's/^.*INSERT INTO/INSERT INTO/' -e 's/);|.*/);/'

答案 1 :(得分:0)

如果所有插入都跨越4行日志文件,那么您可以使用此正则表达式:

 (.*)(INSERT INTO.*\n.*\n.*\n.*\))(;.*)

使用此匹配替换字符串:

 \2\n

答案 2 :(得分:0)

这应该是可能的,这在很大程度上取决于整个文件是否与之相同。

这只是为了获取INSERT,如果你想要日志条目,那么正则表达式需要稍微改变。

$logFile = file_get_contents('inserts.log');

$matches = array();
preg_match_all("/(?P<insert>INSERT .+?;)/s", $logFile, $matches);

foreach ($matches['insert'] as $cQuery) {
    echo $cQuery . "\n";
}

有关此方法的详细信息,请参阅preg_match_all documentation