用于解析多行数据的SQL?

时间:2009-05-27 13:48:10

标签: sql mysql excel ms-access access-vba

我有一个不幸的任务,就是必须定期将数据从excel导入数据库。该表看起来像这样:

  IssueID   References  
  1234      DocID1<cr>DocID2<cr>DocID3
  1235      DocID1
  1236      DocID2
  1237      DocID2<cr>DocID3

引用是一个多行文本字段。我要做的是创建一个与问题表具有一对多关系的Docs表,而不是这些多行引用。

我定义了以下表格:

问题:IssueKey,IssueID,IssueFields

Doc:DocKey,DocID,DocRev,DocOwner等

DocLink:LinkKey,DocKey,IssueKey

由于这将重复运行,因此Doc表已经存在且定义了DocID。因此,我想要做的是在References列中为每个DocID进行查询或VBA代码搜索,并根据IssueID添加一个链接(如果尚不存在)。

简单,对吧?

杰夫

澄清:

1)我有一个名为“Val1”的第三列,表明还有其他列,但这似乎混淆了这个问题。源表中实际上有很多(通向许多,最被忽略的)列,但我只关心上面的两个。

2)我不需要解析分隔符或任何过于偏执的东西:引用包含一个或多个唯一定义的文档引用号(存储为文本)。因此,LIKE过滤器将根据具体情况调出IssueID列表。

3)以下是可接受输出的示例:

IssueID   References
1234      DocID1
1234      DocID2
1234      DocID3
1235      DocID1
1236      DocID2
1237      DocID2
1237      DocID3

理想的解决方案将采用原始的excel表(顶部)和这两个表:

IssueKey   IssueID
   1        1234
   2        1235
   3        1236
   4        1237

DocKey     DocID
  1        DocID1
  2        DocID2
  3        DocID3

填充/更新链接表:

LinkKey  IssueKey  DocKey
   1        1        1
   2        1        2
   3        1        3
   4        2        1
   5        3        2
   6        3        3

4)以下是我对解决方案的预期示例(上面创建#3)。不幸的是它崩溃了Access,所以我无法判断语法是否正确(编辑以反映上面的字段名称)。

SELECT Q1.IssueID, D1.DocID
FROM Docs AS D1, Issues AS Q1
WHERE Q1.IssueID IN 
   ((SELECT Q2.IssueID from Issues AS Q2 where (Q2.References) Like D1.DocID));

5)暂时放弃Access,我在MySQL中有以下工作:

SELECT Q1.IssueID, D1.DocID
FROM Docs AS D1, Issues AS Q1
WHERE Q1.IssueID IN 
   ((SELECT Q2.IssueID from Issues AS Q2 where (Q2.References) Like '%DocID1%'));

这正如我所期望的那样 - 我为每个IssueID提供一个对DocID1的引用,对表中的每个Doc重复一次。根据上述数据,它看起来像:

IssueID   References
1234      DocID1
1234      DocID2
1234      DocID3
1235      DocID1
1235      DocID2
1235      DocID3

现在我只想将'%DocID1%'替换为'%'+ D1.DocID +'%' - 将结果限制为实际匹配的文档ID。出于某些原因,当我这样做时,我得到的记录为零 - 我认为我有相关字段上的通配符错误的语法。

6)以下工作在MySQL中提供了#3,但转换为访问权限的同一查询崩溃了它:

SELECT Q1.IssueID, D1.DocID
FROM Docs AS D1, Issues AS Q1
WHERE Q1.IssueID IN 
   ((SELECT Q2.IssueID from Issues AS Q2 where (Q2.References) Like        
        CONCAT('%',D1.DocID,'%')));

[访问它变为(''&amp; D1.DocID&amp;'')]

结论:访问很糟糕

8 个答案:

答案 0 :(得分:2)

这已被选为答案:

Q2.References LIKE ("*" & D1.DocID & "*"));

但是,我不认为这是安全的。

考虑名为“References”的列的值之一是否包含此数据:

DocID1<cr>DocID999<cr>DocID3

,另一个表中存在值DocID = 9

这里的问题是

"DocID1<cr>DocID999<cr>DocID3" LIKE "*" & "DocID9" & "*" 

将评估为TRUE,这可能是不受欢迎的。

为了解决这个问题,我认为应该通过使用分隔符字符围绕值来使搜索/连接条件中的值安全。

(CHR(13) & Q2.References & CHR(13)) LIKE ("*" & CHR(13) & D1.DocID & CHR(13) & "*"));

答案 1 :(得分:1)

由于这是反复运行,我会问(强烈建议)他们为我提供了一个正确的文件,其中issID和valid出现在每一行。这更容易处理。您需要确定这些字段的值是正确导入系统的。

根据评论:在SQL Server中,您可以构建一个函数来根据逗号的charindex拆分数据。如果您在Google上搜索fn_split,您会找到一个这样的示例。不确定如何在Access中执行此操作,但它可能是一个交互式过程,您在其中查找最后一个逗号并将所有内容移到保留表中然后删除该命令,然后再次执行直到没有更多逗号。最简单的做这样的导入到临时表,您可以按照需要的方式操作数据,然后将最终结果放入真实的表中。

答案 2 :(得分:0)

我的第一选择是在C#或VB.Net中组合一个快速应用程序来处理这个问题。

如果这不可行,我会有一个“导入”表,它按原样采取一切。然后我会使用游标迭代表中的记录。在游标内部,我会跟踪IssueId和Val1并解析References列以创建我的子记录。这部分我将打包成一个存储过程。

答案 3 :(得分:0)

我建议您研究SQL Server Integration Services(SSIS)。创建此工具是为了尽可能快地使用尽可能少的代码执行此类数据导入/导出。

了解它。做一些实验室动手,看看是否有任何例子接近你想要做的事情。

http://en.wikipedia.org/wiki/SQL_Server_Integration_Services http://www.microsoft.com/downloads/details.aspx?familyid=b1145e7a-a4e3-4d14-b1e7-d1d823b6a447&displaylang=en

答案 4 :(得分:0)

您的意思是(打字,未测试):

Dim rs As DAO.Recordset
Dim rsIn As DAO.Recordset ''Or ADO if you link directly to Excel

Set rs=CurrentDB.OpenRecordset( _
   "SELECT * FROM DocLinks dl INNER JOIN Docs d ON dl.DocKey=d.DocKey")

Do While Not rsIn.EOF

   astrDocs=Split(rsIn!References, vbCrLf)

   For Each strDoc In astrDocs
      rs.FindFirst "DocID='" & strDoc & "'"

      If rs.NoMatch Then 
         strSQL="INSERT INTO DocLinks (DocID, IssueID) " _
           & "VALUES ('" strDoc & "'," & rsIn!IssueID  & ")"
         CurrentDB.Execute strSQL, dbFailOnError
      End If
   Next

   rsIn.MoveNext
Loop

编辑评论

如果DocID的长度固定,您可以考虑以下几点:

SELECT Sequence.Seq
       , ImportTable.IssueID
       , Mid(Replace([References],"<cr>",""),[seq],6) AS Docs
FROM Sequence, ImportTable
WHERE ([seq]+5) Mod 6=0) 
AND   Mid(Replace([References],"<cr>",""),[seq],6))<>"" 
AND   Mid(Replace([References],"<cr>",""),[seq],6)) 
      Not In (SELECT DocID FROM Docs)

您将需要一个序列表,其整数从1到至少是参考的最大长度。

答案 5 :(得分:0)

这可以在SQL中轻松完成。我已经编写了一个TVF(表值函数)专门用于演示如何的行分割文本:

    ALTER function [dbo].[fnSplit3]( 
                @parameter varchar(Max)                -- the string to split
                , @Seperator Varchar(64)        -- the string to use as a seperator
        ) 
        RETURNS @Items TABLE(
                ID INT                                                -- the element number
                , item VARCHAR(8000)                -- the split-out string element
                , OffSet int                                -- the original offest
                --( not entirley accurate if LEN(@Seperator) > 1 because of the Replace() )
        ) 
AS
BEGIN 
/*
"Monster" Split in SQL Server 2005 
 From Jeff Moden, 2008/05/22

BYoung, 2008/06/18: Modified to be a Table-Valued Function
                    And to handle CL/LF or LF-only line breaks

Test: (scripts all procs & views in master)
    Select Lines.Item
     From Master.sys.syscomments C
      CROSS APPLY dbo.fnSplit3(C.text, char(13)+char(10)) Lines
     Order by C.ID, Lines.ID

Test2: (scripts all triggers in your database)
    Select Lines.Item
     From sys.sql_modules M
      Join sys.objects O on O.object_id = M.object_id
      CROSS APPLY dbo.fnSplit3(M.definition, char(13)+char(10)) Lines
     Where O.Type = 'TR' 
     Order by O.create_date, Lines.ID
*/
Declare @Sep char(1)
Set @Sep = char(10)        --our seperator character (convenient, doesnt affect performance)
--NOTE: we make the @Sep character LF so that we will automatically
-- parse out rogue LF-only line breaks.

--===== Add start and end seprators to the Parameter so we can handle
        -- all the elements the same way
        --  Also change the seperator expressions to our seperator
        -- character to keep all offsets = 1
SET @Parameter = @Sep+ Replace(@Parameter,@Seperator,@Sep) +@Sep
-- This reduces run-time about 10%

;WITH cteTally AS
(--==== Create a Tally CTE from 1 to whatever the length
        -- of the parameter is
 SELECT TOP (LEN(@Parameter))
        ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
  FROM Master.sys.system_Columns t1
   CROSS JOIN Master.sys.system_Columns t2
)
INSERT into @Items
        SELECT ROW_NUMBER() OVER (ORDER BY N) AS Number,
                SUBSTRING(@Parameter, N+1, CHARINDEX(@Sep, @Parameter, N+1)-N-1) AS Value
                , N+1
         FROM cteTally
         WHERE N < LEN(@Parameter)
          AND SUBSTRING(@Parameter, N, 1) = @Sep --Notice how we find the seperator

        Return 
END

为了在您当前的桌子上使用它&amp;数据这样做:

SELECT Issues.IssueID, Lines.Item as Reference
 From Issues
  Cross Apply dbo.fnSplit3(Issues.Reference, char(13)) Lines
 Order By IssueID, Reference

答案 6 :(得分:0)

我在这里遇到基于集合的SQL解决方案时遇到问题。我以前做过这种事,我不得不在某种程度上刷新记忆,但是我遇到了一个问题。我认为这是一个问题(功能/错误?)与引擎,但我可能做一些愚蠢的事情。也许有人与Jet / ACE有密切关系并且可以阅读VBA的人可以在这个答案的最后看一下代码,并希望将这一点推进......?

基本方法是使用带有MID()表达式的整数序列表来解析数据列(我已将其重命名为MyReferences,因为REFERENCES是SQL关键字)。

这是使用SQL DDL / DML重新创建测试表/数据的MS Access VBA。请注意,第一个SELECT查询返回子字符串以及星号和结尾分隔符;显然,我们正在寻找两个分隔符都是分隔字符的行,在这种情况下为CHR(13)。第二个SELECT查询仅添加了所需分隔符的搜索条件,但错误为“无效过程调用”;当使用无效参数值调用MID()表达式时会发生这种情况,例如

SELECT MID('A', 0, 0)

我想发生的事情是优化器没有使用子查询作为“快捷方式”,而是在序列表中搜索条件之前评估MID()表达式。如果是这样,它有点愚蠢,我想不出强迫评估顺序的方法。

那么,我的或引擎是否有问题?

Sub main()

  Dim sql As String

  sql = _
      "DROP TABLE ImportTable;"

  On Error Resume Next  ' Table may not exist
  CurrentProject.Connection.Execute sql
  On Error GoTo 0

  sql = _
      "DROP TABLE Sequence;"

  On Error Resume Next  ' Table may not exist
  CurrentProject.Connection.Execute sql
  On Error GoTo 0

  sql = _
      "CREATE TABLE ImportTable ( " & _
      "IssueID INTEGER NOT NULL UNIQUE, MyReferences VARCHAR(90) NOT NULL);"

  CurrentProject.Connection.Execute sql

  sql = _
      "INSERT INTO ImportTable VALUES (1234, 'DocID1' & Chr(13) & 'DocID22' & Chr(13) & 'DocID3');"

  CurrentProject.Connection.Execute sql

  sql = _
      "CREATE TABLE Sequence (seq INTEGER NOT NULL UNIQUE);"

  CurrentProject.Connection.Execute sql

  sql = _
      "INSERT INTO Sequence VALUES (-1);"

  CurrentProject.Connection.Execute sql

 sql = _
        "INSERT INTO [Sequence] (seq) SELECT Units.nbr + Tens.nbr" & _
        " FROM ( SELECT" & _
        " nbr FROM ( SELECT 0 AS nbr FROM [Sequence] UNION" & _
        " ALL SELECT 1 FROM [Sequence] UNION ALL SELECT 2 FROM" & _
        " [Sequence] UNION ALL SELECT 3 FROM [Sequence] UNION" & _
        " ALL SELECT 4 FROM [Sequence] UNION ALL SELECT 5 FROM" & _
        " [Sequence] UNION ALL SELECT 6 FROM [Sequence] UNION" & _
        " ALL SELECT 7 FROM [Sequence] UNION ALL SELECT 8 FROM" & _
        " [Sequence] UNION ALL SELECT 9 FROM [Sequence] ) AS" & _
        " Digits ) AS Units, ( SELECT nbr * 10 AS nbr FROM" & _
        " ( SELECT 0 AS nbr FROM [Sequence] UNION ALL SELECT" & _
        " 1 FROM [Sequence] UNION ALL SELECT 2 FROM [Sequence]" & _
        " UNION ALL SELECT 3 FROM [Sequence] UNION ALL SELECT" & _
        " 4 FROM [Sequence] UNION ALL SELECT 5 FROM [Sequence]" & _
        " UNION ALL SELECT 6 FROM [Sequence] UNION ALL SELECT" & _
        " 7 FROM [Sequence] UNION ALL SELECT 8 FROM [Sequence]" & _
        " UNION ALL SELECT 9 FROM [Sequence] ) AS Digits )" & _
        " AS Tens;"

  CurrentProject.Connection.Execute sql

  sql = _
      "SELECT DT1.IssueID, DT1.parsed_text, DT1.delimiter_1, DT1.delimiter_2 " & _
      "FROM ( " & _
      "SELECT I1.IssueID, MID(I1.MyReferences, S1.seq, S2.seq - S1.seq - LEN(CHR(13))) AS parsed_text, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S1.seq, LEN(CHR(13))) AS delimiter_1, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S2.seq, LEN(CHR(13))) AS delimiter_2 " & _
      "FROM ImportTable AS I1, Sequence AS S1, Sequence AS S2 " & _
      "WHERE S1.seq < S2.seq " & _
      "AND S2.seq - S1.seq - LEN(CHR(13)) > 0 " & _
      "AND S1.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      "AND S2.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      ") AS DT1;"

  Dim rs As ADODB.Recordset
  Set rs = CurrentProject.Connection.Execute(sql)

  MsgBox rs.GetString

  sql = _
      "SELECT DT1.IssueID, DT1.parsed_text, DT1.delimiter_1, DT1.delimiter_2 " & _
      "FROM ( " & _
      "SELECT I1.IssueID, MID(I1.MyReferences, S1.seq, S2.seq - S1.seq - LEN(CHR(13))) AS parsed_text, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S1.seq, LEN(CHR(13))) AS delimiter_1, " & _
      " MID(CHR(13) & I1.MyReferences & CHR(13), S2.seq, LEN(CHR(13))) AS delimiter_2 " & _
      "FROM ImportTable AS I1, Sequence AS S1, Sequence AS S2 " & _
      "WHERE S1.seq < S2.seq " & _
      "AND S2.seq - S1.seq - LEN(CHR(13)) > 0 " & _
      "AND S1.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      "AND S2.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
      ") AS DT1 " & _
      "WHERE DT1.delimiter_1 = CHR(13) " & _
      "AND DT1.delimiter_2 = CHR(13);"

  Set rs = CurrentProject.Connection.Execute(sql)

  MsgBox rs.GetString

End Sub

FWIW这里是我多年前写的一个PROCEDURE,用于将分隔列表解析为表格。它似乎适用于最多255个字符的值;任何更多,你得到一个非常讨厌的ACE / Jet引擎错误。再说一遍,除了发动机无法应对之外,我没有看到问题是什么!无论如何,我的观点是,这是有效的(对于小值),我无法弄清楚为什么我无法使其适应手头的问题:

CREATE PROCEDURE ListToTable
(
   delimted_text MEMO,
   delimiter VARCHAR(4) = ','
)
AS
SELECT MID(I1.input_string, S1.seq, MIN(S2.seq) - S1.seq - LEN(delimiter)) AS param
  FROM
      (
       SELECT DISTINCT delimted_text AS input_string
         FROM Sequence AS S3
        WHERE S3.seq BETWEEN 1 AND LEN(delimted_text)
      ) AS I1, Sequence AS S1, Sequence AS S2
 WHERE MID(delimiter & I1.input_string & delimiter, S1.seq, LEN(delimiter)) = delimiter
       AND MID(delimiter & I1.input_string & delimiter, S2.seq, LEN(delimiter)) = delimiter
       AND S1.seq < S2.seq
       AND S1.seq BETWEEN 1 AND LEN(delimiter) + LEN(delimted_text) + LEN(delimiter)
       AND S2.seq BETWEEN 1 AND LEN(delimiter) + LEN(delimted_text) + LEN(delimiter)
 GROUP 
    BY I1.input_string, S1.seq
HAVING LEN(MID(I1.input_string, S1.seq, MAX(S2.seq) - S1.seq - LEN(delimiter))) > 0;

答案 7 :(得分:0)

我认为在标题中使用“解析”这个词会让每个人都感到困惑。 Access中的错误是对查询(而不是表)执行的相关查询导致挂起。因此,我创建了一个临时表,将References列(带有多行文本)广告到Issues表,以便我可以访问其他字段。最终查询创建上述链接表,以及DocID和IssueID以供参考:

SELECT Q1.IssueID, Q1.IssueKey, D1.DocKey, D1.DocID
FROM Issues AS Q1, Documents AS D1
WHERE Q1.IssueID in 
  (SELECT  Q2.IssueID FROM Issues AS Q2 WHERE Q2.References LIKE ("*" & D1.DocID & "*"));

内部选择会拉出引用列中具有给定文档的问题列表。外部选择为每个文档执行此操作,从而生成聚合列表。