我有一个不幸的任务,就是必须定期将数据从excel导入数据库。该表看起来像这样:
IssueID References
1234 DocID1<cr>DocID2<cr>DocID3
1235 DocID1
1236 DocID2
1237 DocID2<cr>DocID3
引用是一个多行文本字段。我要做的是创建一个与问题表具有一对多关系的Docs表,而不是这些多行引用。
我定义了以下表格:
问题:IssueKey,IssueID,IssueFields
Doc:DocKey,DocID,DocRev,DocOwner等
DocLink:LinkKey,DocKey,IssueKey
由于这将重复运行,因此Doc表已经存在且定义了DocID。因此,我想要做的是在References列中为每个DocID进行查询或VBA代码搜索,并根据IssueID添加一个链接(如果尚不存在)。
简单,对吧?
杰夫
澄清:
1)我有一个名为“Val1”的第三列,表明还有其他列,但这似乎混淆了这个问题。源表中实际上有很多(通向许多,最被忽略的)列,但我只关心上面的两个。
2)我不需要解析分隔符或任何过于偏执的东西:引用包含一个或多个唯一定义的文档引用号(存储为文本)。因此,LIKE过滤器将根据具体情况调出IssueID列表。
3)以下是可接受输出的示例:
IssueID References
1234 DocID1
1234 DocID2
1234 DocID3
1235 DocID1
1236 DocID2
1237 DocID2
1237 DocID3
理想的解决方案将采用原始的excel表(顶部)和这两个表:
IssueKey IssueID
1 1234
2 1235
3 1236
4 1237
DocKey DocID
1 DocID1
2 DocID2
3 DocID3
填充/更新链接表:
LinkKey IssueKey DocKey
1 1 1
2 1 2
3 1 3
4 2 1
5 3 2
6 3 3
4)以下是我对解决方案的预期示例(上面创建#3)。不幸的是它崩溃了Access,所以我无法判断语法是否正确(编辑以反映上面的字段名称)。
SELECT Q1.IssueID, D1.DocID
FROM Docs AS D1, Issues AS Q1
WHERE Q1.IssueID IN
((SELECT Q2.IssueID from Issues AS Q2 where (Q2.References) Like D1.DocID));
5)暂时放弃Access,我在MySQL中有以下工作:
SELECT Q1.IssueID, D1.DocID
FROM Docs AS D1, Issues AS Q1
WHERE Q1.IssueID IN
((SELECT Q2.IssueID from Issues AS Q2 where (Q2.References) Like '%DocID1%'));
这正如我所期望的那样 - 我为每个IssueID提供一个对DocID1的引用,对表中的每个Doc重复一次。根据上述数据,它看起来像:
IssueID References
1234 DocID1
1234 DocID2
1234 DocID3
1235 DocID1
1235 DocID2
1235 DocID3
现在我只想将'%DocID1%'替换为'%'+ D1.DocID +'%' - 将结果限制为实际匹配的文档ID。出于某些原因,当我这样做时,我得到的记录为零 - 我认为我有相关字段上的通配符错误的语法。
6)以下工作在MySQL中提供了#3,但转换为访问权限的同一查询崩溃了它:
SELECT Q1.IssueID, D1.DocID
FROM Docs AS D1, Issues AS Q1
WHERE Q1.IssueID IN
((SELECT Q2.IssueID from Issues AS Q2 where (Q2.References) Like
CONCAT('%',D1.DocID,'%')));
[访问它变为(''&amp; D1.DocID&amp;'')]
结论:访问很糟糕
答案 0 :(得分:2)
这已被选为答案:
Q2.References LIKE ("*" & D1.DocID & "*"));
但是,我不认为这是安全的。
考虑名为“References”的列的值之一是否包含此数据:
DocID1<cr>DocID999<cr>DocID3
,另一个表中存在值DocID = 9
。
这里的问题是
"DocID1<cr>DocID999<cr>DocID3" LIKE "*" & "DocID9" & "*"
将评估为TRUE
,这可能是不受欢迎的。
为了解决这个问题,我认为应该通过使用分隔符字符围绕值来使搜索/连接条件中的值安全。
(CHR(13) & Q2.References & CHR(13)) LIKE ("*" & CHR(13) & D1.DocID & CHR(13) & "*"));
答案 1 :(得分:1)
由于这是反复运行,我会问(强烈建议)他们为我提供了一个正确的文件,其中issID和valid出现在每一行。这更容易处理。您需要确定这些字段的值是正确导入系统的。
根据评论:在SQL Server中,您可以构建一个函数来根据逗号的charindex拆分数据。如果您在Google上搜索fn_split,您会找到一个这样的示例。不确定如何在Access中执行此操作,但它可能是一个交互式过程,您在其中查找最后一个逗号并将所有内容移到保留表中然后删除该命令,然后再次执行直到没有更多逗号。最简单的做这样的导入到临时表,您可以按照需要的方式操作数据,然后将最终结果放入真实的表中。
答案 2 :(得分:0)
我的第一选择是在C#或VB.Net中组合一个快速应用程序来处理这个问题。
如果这不可行,我会有一个“导入”表,它按原样采取一切。然后我会使用游标迭代表中的记录。在游标内部,我会跟踪IssueId和Val1并解析References列以创建我的子记录。这部分我将打包成一个存储过程。
答案 3 :(得分:0)
我建议您研究SQL Server Integration Services(SSIS)。创建此工具是为了尽可能快地使用尽可能少的代码执行此类数据导入/导出。
了解它。做一些实验室动手,看看是否有任何例子接近你想要做的事情。
http://en.wikipedia.org/wiki/SQL_Server_Integration_Services http://www.microsoft.com/downloads/details.aspx?familyid=b1145e7a-a4e3-4d14-b1e7-d1d823b6a447&displaylang=en
答案 4 :(得分:0)
您的意思是(打字,未测试):
Dim rs As DAO.Recordset
Dim rsIn As DAO.Recordset ''Or ADO if you link directly to Excel
Set rs=CurrentDB.OpenRecordset( _
"SELECT * FROM DocLinks dl INNER JOIN Docs d ON dl.DocKey=d.DocKey")
Do While Not rsIn.EOF
astrDocs=Split(rsIn!References, vbCrLf)
For Each strDoc In astrDocs
rs.FindFirst "DocID='" & strDoc & "'"
If rs.NoMatch Then
strSQL="INSERT INTO DocLinks (DocID, IssueID) " _
& "VALUES ('" strDoc & "'," & rsIn!IssueID & ")"
CurrentDB.Execute strSQL, dbFailOnError
End If
Next
rsIn.MoveNext
Loop
编辑评论
如果DocID的长度固定,您可以考虑以下几点:
SELECT Sequence.Seq
, ImportTable.IssueID
, Mid(Replace([References],"<cr>",""),[seq],6) AS Docs
FROM Sequence, ImportTable
WHERE ([seq]+5) Mod 6=0)
AND Mid(Replace([References],"<cr>",""),[seq],6))<>""
AND Mid(Replace([References],"<cr>",""),[seq],6))
Not In (SELECT DocID FROM Docs)
您将需要一个序列表,其整数从1到至少是参考的最大长度。
答案 5 :(得分:0)
这可以在SQL中轻松完成。我已经编写了一个TVF(表值函数)专门用于演示如何的行分割文本:
ALTER function [dbo].[fnSplit3](
@parameter varchar(Max) -- the string to split
, @Seperator Varchar(64) -- the string to use as a seperator
)
RETURNS @Items TABLE(
ID INT -- the element number
, item VARCHAR(8000) -- the split-out string element
, OffSet int -- the original offest
--( not entirley accurate if LEN(@Seperator) > 1 because of the Replace() )
)
AS
BEGIN
/*
"Monster" Split in SQL Server 2005
From Jeff Moden, 2008/05/22
BYoung, 2008/06/18: Modified to be a Table-Valued Function
And to handle CL/LF or LF-only line breaks
Test: (scripts all procs & views in master)
Select Lines.Item
From Master.sys.syscomments C
CROSS APPLY dbo.fnSplit3(C.text, char(13)+char(10)) Lines
Order by C.ID, Lines.ID
Test2: (scripts all triggers in your database)
Select Lines.Item
From sys.sql_modules M
Join sys.objects O on O.object_id = M.object_id
CROSS APPLY dbo.fnSplit3(M.definition, char(13)+char(10)) Lines
Where O.Type = 'TR'
Order by O.create_date, Lines.ID
*/
Declare @Sep char(1)
Set @Sep = char(10) --our seperator character (convenient, doesnt affect performance)
--NOTE: we make the @Sep character LF so that we will automatically
-- parse out rogue LF-only line breaks.
--===== Add start and end seprators to the Parameter so we can handle
-- all the elements the same way
-- Also change the seperator expressions to our seperator
-- character to keep all offsets = 1
SET @Parameter = @Sep+ Replace(@Parameter,@Seperator,@Sep) +@Sep
-- This reduces run-time about 10%
;WITH cteTally AS
(--==== Create a Tally CTE from 1 to whatever the length
-- of the parameter is
SELECT TOP (LEN(@Parameter))
ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
FROM Master.sys.system_Columns t1
CROSS JOIN Master.sys.system_Columns t2
)
INSERT into @Items
SELECT ROW_NUMBER() OVER (ORDER BY N) AS Number,
SUBSTRING(@Parameter, N+1, CHARINDEX(@Sep, @Parameter, N+1)-N-1) AS Value
, N+1
FROM cteTally
WHERE N < LEN(@Parameter)
AND SUBSTRING(@Parameter, N, 1) = @Sep --Notice how we find the seperator
Return
END
为了在您当前的桌子上使用它&amp;数据这样做:
SELECT Issues.IssueID, Lines.Item as Reference
From Issues
Cross Apply dbo.fnSplit3(Issues.Reference, char(13)) Lines
Order By IssueID, Reference
答案 6 :(得分:0)
我在这里遇到基于集合的SQL解决方案时遇到问题。我以前做过这种事,我不得不在某种程度上刷新记忆,但是我遇到了一个问题。我认为这是一个问题(功能/错误?)与引擎,但我可能做一些愚蠢的事情。也许有人与Jet / ACE有密切关系并且可以阅读VBA的人可以在这个答案的最后看一下代码,并希望将这一点推进......?
基本方法是使用带有MID()
表达式的整数序列表来解析数据列(我已将其重命名为MyReferences,因为REFERENCES
是SQL关键字)。
这是使用SQL DDL / DML重新创建测试表/数据的MS Access VBA。请注意,第一个SELECT
查询返回子字符串以及星号和结尾分隔符;显然,我们正在寻找两个分隔符都是分隔字符的行,在这种情况下为CHR(13)
。第二个SELECT
查询仅添加了所需分隔符的搜索条件,但错误为“无效过程调用”;当使用无效参数值调用MID()
表达式时会发生这种情况,例如
SELECT MID('A', 0, 0)
我想发生的事情是优化器没有使用子查询作为“快捷方式”,而是在序列表中搜索条件之前评估MID()
表达式。如果是这样,它有点愚蠢,我想不出强迫评估顺序的方法。
那么,我的或引擎是否有问题?
Sub main()
Dim sql As String
sql = _
"DROP TABLE ImportTable;"
On Error Resume Next ' Table may not exist
CurrentProject.Connection.Execute sql
On Error GoTo 0
sql = _
"DROP TABLE Sequence;"
On Error Resume Next ' Table may not exist
CurrentProject.Connection.Execute sql
On Error GoTo 0
sql = _
"CREATE TABLE ImportTable ( " & _
"IssueID INTEGER NOT NULL UNIQUE, MyReferences VARCHAR(90) NOT NULL);"
CurrentProject.Connection.Execute sql
sql = _
"INSERT INTO ImportTable VALUES (1234, 'DocID1' & Chr(13) & 'DocID22' & Chr(13) & 'DocID3');"
CurrentProject.Connection.Execute sql
sql = _
"CREATE TABLE Sequence (seq INTEGER NOT NULL UNIQUE);"
CurrentProject.Connection.Execute sql
sql = _
"INSERT INTO Sequence VALUES (-1);"
CurrentProject.Connection.Execute sql
sql = _
"INSERT INTO [Sequence] (seq) SELECT Units.nbr + Tens.nbr" & _
" FROM ( SELECT" & _
" nbr FROM ( SELECT 0 AS nbr FROM [Sequence] UNION" & _
" ALL SELECT 1 FROM [Sequence] UNION ALL SELECT 2 FROM" & _
" [Sequence] UNION ALL SELECT 3 FROM [Sequence] UNION" & _
" ALL SELECT 4 FROM [Sequence] UNION ALL SELECT 5 FROM" & _
" [Sequence] UNION ALL SELECT 6 FROM [Sequence] UNION" & _
" ALL SELECT 7 FROM [Sequence] UNION ALL SELECT 8 FROM" & _
" [Sequence] UNION ALL SELECT 9 FROM [Sequence] ) AS" & _
" Digits ) AS Units, ( SELECT nbr * 10 AS nbr FROM" & _
" ( SELECT 0 AS nbr FROM [Sequence] UNION ALL SELECT" & _
" 1 FROM [Sequence] UNION ALL SELECT 2 FROM [Sequence]" & _
" UNION ALL SELECT 3 FROM [Sequence] UNION ALL SELECT" & _
" 4 FROM [Sequence] UNION ALL SELECT 5 FROM [Sequence]" & _
" UNION ALL SELECT 6 FROM [Sequence] UNION ALL SELECT" & _
" 7 FROM [Sequence] UNION ALL SELECT 8 FROM [Sequence]" & _
" UNION ALL SELECT 9 FROM [Sequence] ) AS Digits )" & _
" AS Tens;"
CurrentProject.Connection.Execute sql
sql = _
"SELECT DT1.IssueID, DT1.parsed_text, DT1.delimiter_1, DT1.delimiter_2 " & _
"FROM ( " & _
"SELECT I1.IssueID, MID(I1.MyReferences, S1.seq, S2.seq - S1.seq - LEN(CHR(13))) AS parsed_text, " & _
" MID(CHR(13) & I1.MyReferences & CHR(13), S1.seq, LEN(CHR(13))) AS delimiter_1, " & _
" MID(CHR(13) & I1.MyReferences & CHR(13), S2.seq, LEN(CHR(13))) AS delimiter_2 " & _
"FROM ImportTable AS I1, Sequence AS S1, Sequence AS S2 " & _
"WHERE S1.seq < S2.seq " & _
"AND S2.seq - S1.seq - LEN(CHR(13)) > 0 " & _
"AND S1.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
"AND S2.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
") AS DT1;"
Dim rs As ADODB.Recordset
Set rs = CurrentProject.Connection.Execute(sql)
MsgBox rs.GetString
sql = _
"SELECT DT1.IssueID, DT1.parsed_text, DT1.delimiter_1, DT1.delimiter_2 " & _
"FROM ( " & _
"SELECT I1.IssueID, MID(I1.MyReferences, S1.seq, S2.seq - S1.seq - LEN(CHR(13))) AS parsed_text, " & _
" MID(CHR(13) & I1.MyReferences & CHR(13), S1.seq, LEN(CHR(13))) AS delimiter_1, " & _
" MID(CHR(13) & I1.MyReferences & CHR(13), S2.seq, LEN(CHR(13))) AS delimiter_2 " & _
"FROM ImportTable AS I1, Sequence AS S1, Sequence AS S2 " & _
"WHERE S1.seq < S2.seq " & _
"AND S2.seq - S1.seq - LEN(CHR(13)) > 0 " & _
"AND S1.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
"AND S2.seq BETWEEN 1 AND LEN(CHR(13)) + LEN(I1.MyReferences) + LEN(CHR(13)) " & _
") AS DT1 " & _
"WHERE DT1.delimiter_1 = CHR(13) " & _
"AND DT1.delimiter_2 = CHR(13);"
Set rs = CurrentProject.Connection.Execute(sql)
MsgBox rs.GetString
End Sub
FWIW这里是我多年前写的一个PROCEDURE
,用于将分隔列表解析为表格。它似乎适用于最多255个字符的值;任何更多,你得到一个非常讨厌的ACE / Jet引擎错误。再说一遍,除了发动机无法应对之外,我没有看到问题是什么!无论如何,我的观点是,这是有效的(对于小值),我无法弄清楚为什么我无法使其适应手头的问题:
CREATE PROCEDURE ListToTable
(
delimted_text MEMO,
delimiter VARCHAR(4) = ','
)
AS
SELECT MID(I1.input_string, S1.seq, MIN(S2.seq) - S1.seq - LEN(delimiter)) AS param
FROM
(
SELECT DISTINCT delimted_text AS input_string
FROM Sequence AS S3
WHERE S3.seq BETWEEN 1 AND LEN(delimted_text)
) AS I1, Sequence AS S1, Sequence AS S2
WHERE MID(delimiter & I1.input_string & delimiter, S1.seq, LEN(delimiter)) = delimiter
AND MID(delimiter & I1.input_string & delimiter, S2.seq, LEN(delimiter)) = delimiter
AND S1.seq < S2.seq
AND S1.seq BETWEEN 1 AND LEN(delimiter) + LEN(delimted_text) + LEN(delimiter)
AND S2.seq BETWEEN 1 AND LEN(delimiter) + LEN(delimted_text) + LEN(delimiter)
GROUP
BY I1.input_string, S1.seq
HAVING LEN(MID(I1.input_string, S1.seq, MAX(S2.seq) - S1.seq - LEN(delimiter))) > 0;
答案 7 :(得分:0)
我认为在标题中使用“解析”这个词会让每个人都感到困惑。 Access中的错误是对查询(而不是表)执行的相关查询导致挂起。因此,我创建了一个临时表,将References列(带有多行文本)广告到Issues表,以便我可以访问其他字段。最终查询创建上述链接表,以及DocID和IssueID以供参考:
SELECT Q1.IssueID, Q1.IssueKey, D1.DocKey, D1.DocID
FROM Issues AS Q1, Documents AS D1
WHERE Q1.IssueID in
(SELECT Q2.IssueID FROM Issues AS Q2 WHERE Q2.References LIKE ("*" & D1.DocID & "*"));
内部选择会拉出引用列中具有给定文档的问题列表。外部选择为每个文档执行此操作,从而生成聚合列表。