我有一个查询,其中一列是电子邮件标题的字符串,例如:
From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
Delivery-Date: Tue, 25 Jan 2011 15:31:01 -0700
Received: from po-out-1718.google.com ([72.14.252.155]:54907) by cl35.gs01.grid ...
Received: by po-out-1718.google.com with SMTP id y22so795146pof.4 for <user@exa ...
Received: by 10.141.116.17 with SMTP id t17mr3929916rvm.251.1214951458741; Tue,...
Received: by 10.140.188.3 with HTTP; Tue, 25 Jan 2011 15:30:58 -0700 (PDT)
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=d...
Domainkey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:da...
Message-Id: <c8f49cec0807011530k11196ad4p7cb4b9420f2ae752@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_3927_12044027.1214951...
X-Spam-Status: score=3.7 tests=DNS_FROM_RFC_POST, HTML_00_10, HTML_MESSAGE, HTM...
X-Spam-Level: ***
Message Body: This is a KnowledgeBase article that provides information on how ...
我想仅提取'To:'
字段中包含的电子邮件地址,在上面的示例user@example.com
中。
我该如何做到这一点?
答案 0 :(得分:2)
您可以使用分割功能。我喜欢使用数字表的版本,但是there are many alternatives。首先,一个包含1,000,000行的数字表:
SET NOCOUNT ON;
DECLARE @UpperLimit INT;
SET @UpperLimit = 1000000;
WITH n(rn) AS
(
SELECT TOP (@UpperLimit) ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_columns AS s1, sys.all_objects ORDER BY s1.[object_id]
)
SELECT [Number] = rn - 1
INTO dbo.Numbers FROM n
WHERE rn <= @UpperLimit + 1;
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);
现在是一个通用的,内联的表值分割函数,它将分隔的字符串转换为集合:
CREATE FUNCTION dbo.SplitString
(
@List NVARCHAR(MAX),
@Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value] FROM
(
SELECT
[Value] = LTRIM(RTRIM(SUBSTRING(@List, [Number],
CHARINDEX(@Delim, @List + @Delim, [Number]) - [Number])))
FROM dbo.Numbers WHERE Number <= LEN(@List)
AND SUBSTRING(@Delim + @List, [Number], LEN(@Delim)) = @Delim
) AS x
);
GO
然后很简单:
DECLARE @x NVARCHAR(MAX) = N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...';
SELECT LTRIM(SUBSTRING(Value, 4, 4000))
FROM dbo.SplitString(@x, CHAR(13)+CHAR(10))
WHERE Value LIKE 'To: %@%';
表中的数据?好的,没问题:
DECLARE @a TABLE(id INT, email NVARCHAR(MAX));
INSERT @a VALUES
(1,N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...'),
(2,N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: differentUser@somewhereelse.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...');
SELECT a.id, LTRIM(SUBSTRING(x.Value, 4, 4000))
FROM @a AS a
CROSS APPLY dbo.SplitString(a.email, CHAR(13)+CHAR(10)) AS x
WHERE x.Value LIKE 'To: %@%';
现在,您可能需要使用分隔符 - 它可能只是CHAR(10),或者只是CHAR(13),或者它们可能处于不同的顺序 - 不确定,并且无法从您的代码中分辨出来这是什么......
答案 1 :(得分:1)
您可以使用XML功能拆分行并找到所需内容;
DECLARE @X XML
SELECT @X = CONVERT(XML, '<y><x>' +
REPLACE(REPLACE(value, '<', '<'), CHAR(10), '</x><x>') +
'</x></y>')
FROM test
SELECT [Value] = T.c.value('.','NVARCHAR(MAX)')
FROM @X.nodes('/y/x') T(c)
WHERE T.c.value('.','NVARCHAR(MAX)') LIKE 'To: %'
答案 2 :(得分:0)
试试这个:
select substring(@s, charindex(char(13)+char(10)+'To: ', @s) + 6, charindex(char(13), @s, charindex(char(13)+char(10)+'To: ', @s)+6) - (charindex(char(13)+char(10)+'To: ', @s)+6))
这是一个完整的测试脚本:
declare @s varchar(500)
set @s = 'Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com'
select substring(@s, charindex(char(13)+char(10)+'To: ', @s) + 6, charindex(char(13)+char(10), @s, charindex(char(13)+char(10)+'To: ', @s)+6) - (charindex(char(13)+char(10)+'To: ', @s)+6))
请注意,在正确的电子邮件中,标题必须根据规范RFC2822由CRLF(char(13)+ char(10))分隔,并且上述代码做出相同的假设。
如果您的电子邮件中有不同的行结尾,则可能必须将char(13)+char(10)
的每次出现更改为仅char(13)
或char(10)
。如果您这样做,请记住同时将+6
调整为+5
(因为它减少了一个字符)。
答案 3 :(得分:0)
如果电子邮件地址位于第一个'To:'
和'Return-Path:'
之间,您可以使用此地址( Fiddle demo ):
declare @s nvarchar(max) = 'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>...'
select substring(@s, charindex('To:',@s)+3,
charindex('Return-Path:',@s)- charindex('To:',@s)-3)
--Results
user@example.com
更通用的版本:假设电子邮件地址位于第一个返回路径之前
;with cte as (
select reverse(left(@s, charindex('Return-Path:',@s)-1)) rs
)
select reverse(left(rs, charindex(':oT', rs)-1))
from cte
在表格查询中,请将@s
替换为您的column name
。