我需要从Microsoft Content Management Server(MCMS)数据库中提取大量数据(> 1000页),以便在Sitecore网站中使用。
我可以看到两个主要选项:
将数据迁移到新的简化数据库并显示该数据库 新网站上的信息。
将MCMS解决方案转换为SharePoint并使用SharePoint 连接器模块可供Sitecore显示此信息。
我更倾向于沿着第一条路线前进,因为未来没有计划使用SharePoint来管理数据/内容,而是希望将此信息存储在简单的SQL Server数据库中以便更好地搜索。
我查看了有问题的数据库,并认为我感兴趣的主要表格是Node
,NodePlaceholder
和NodePlaceholderContent
,但我很难找到我想要的内容期望。任何人都可以为我提供一些关于这个数据库架构的解释吗?或者我是否会尝试以这种方式迁移数据?
答案 0 :(得分:6)
我最近刚刚经历了一个从MCMS 2002中导出内容页面的类似过程(迁移到Wordpress)。
我不是说这是获取数据的100%正确方法,但它对我有用。
以下是我从网页中获取网页内容的过程。
正如您已经看到的那样,存储大部分数据的表格为Node
和NodePlaceholderContent
1。)要了解Node
表的含义,您可以查看按类型组织的内容
SELECT
[Type]
,CASE [Type]
WHEN 1 THEN 'Server'
WHEN 4 THEN 'Channel'
WHEN 16 THEN 'Post/Page'
WHEN 64 THEN 'Resource Gallery'
WHEN 256 THEN 'Resource Gallery Item (images/documents)'
WHEN 16384 THEN 'Template Gallery'
WHEN 65536 THEN 'Template' END as [Description]
,COUNT([Type]) as [Count]
FROM dbo.Node
GROUP BY [Type]
ORDER BY [Count] DESC
2。)页面(和帖子,将覆盖帖子向下)是type = 16 ...但是为了得到页面(而不是帖子)我们需要按IsShortcut = 0
过滤
SELECT * FROM dbo.Node WHERE [Type] = 16 AND IsShortcut = 0
3。)我只想要发布的页面,所以按ApprovalStatus = 1
-- Get all published pages
SELECT *
FROM dbo.Node WHERE [Type] = 16
AND IsShortcut = 0
AND ApprovalStatus = 1
4.)接下来,确定由(使用用户名)
创建/修改的页面-- Get published pages & author/editor
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
FROM dbo.Node [page]
-- add JOIN on created by user
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
-- add JOIN on modified by user
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
WHERE [Type] = 16
AND IsShortcut = 0
AND ApprovalStatus = 1
5.)接下来,使用Node.ParentGUID
列
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[pageParent].Name -- add page parent Name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- add JOIN on Node using ParentGUID
INNER JOIN dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
此查询让我知道页面位于名为Folders
或Archive Folder
的父节点中
6。)上升到另一个级别(获得父级的父级)
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[pageParent].Name
,[pageParent2].Name -- add parent of parent name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
-- add another JOIN on Node using ParentGUID (parent of parent)
INNER JOIN dbo.Node [pageParent2] ON [pageParent2].NodeGUID = [pageParent].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
父级的父级是Server
(根级别)所以现在我的结论是页面的父级是:
Folders
- 那是一个活跃的页面Archive Folder
- 那是另一页的先前修订版我只想要活动页面,所以我要加入Folders
仅限父母
7。)现在标记怎么样了。在我们的MCMS模板中,只有一个占位符区域。如果模板中有多个占位符区域,NodePlaceholder
表将标识占位符的名称。我只是为了简单而加入NodePlaceholdercontent
。
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
/* remove parent names */
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
,html.PropValue as 'HTML' -- add the markup
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- change alias to "folders"
INNER JOIN dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
-- join on PlaceholderContent to get the HTML
-- this table will also have references to any static files contained in the page (such as images) so we filter those out by PropName = 'HTML'
INNER JOIN dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML'
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
8。)所以在这一点上我试图确定页面在系统中的位置(即相对路径或它所处的通道),然后回到步骤1& 2,type = 16可以是一个帖子或一个页面(它们不是同一个东西,但它们是相关的)。所以现在我们将页面加入到帖子记录中以确定路径。
经过一些谷歌搜索,我偶然发现this excerpt from Microsoft Content Management Server 2002: a complete guide确实有助于完成剩下的工作(并确定了Node.Type
枚举)
SELECT
[page].Id
,[page].NodeGuid
,[page].Name
,[post].DisplayName as 'Title' -- add page Title from the post record
,[pageParent].Name
,[pageParent2].Name
,[created].Username as 'CreatedBy'
,[page].CreatedWhen
,[modified].Username as 'ModifiedBy'
,[page].ModifiedWhen
,html.PropValue as 'HTML'
FROM dbo.Node [page]
INNER JOIN dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
INNER JOIN dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML'
-- join using followGUID to get the posting
INNER JOIN dbo.Node [post] ON [post].FollowGUID = [page].NodeGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1
9。)现在最后一步是继续上升后父级层次结构,导致几个LEFT JOINS加强ParentGUID链。此查询使用这些LEFT JOINS提供层次结构的直观表示。
SELECT
CASE WHEN postParent9.Name IS NULL THEN '' ELSE postParent9.Name + ' > ' END +
CASE WHEN postParent8.Name IS NULL THEN '' ELSE postParent8.Name + ' > ' END +
CASE WHEN postParent7.Name IS NULL THEN '' ELSE postParent7.Name + ' > ' END +
CASE WHEN postParent6.Name IS NULL THEN '' ELSE postParent6.Name + ' > ' END +
CASE WHEN postParent5.Name IS NULL THEN '' ELSE postParent5.Name + ' > ' END +
CASE WHEN postParent4.Name IS NULL THEN '' ELSE postParent4.Name + ' > ' END +
CASE WHEN postParent3.Name IS NULL THEN '' ELSE postParent3.Name + ' > ' END +
CASE WHEN postParent2.Name IS NULL THEN '' ELSE postParent2.Name + ' > ' END +
CASE WHEN postParent1.Name IS NULL THEN '' ELSE postParent1.Name + ' > ' END +
page.Name as [Path]
,page.Name + '.htm' as [PageName]
,post.DisplayName as [PageTitle]
,CASE page.[Type]
WHEN 1 THEN 'Server'
WHEN 4 THEN 'Channel'
WHEN 16 THEN 'Post/Page'
WHEN 64 THEN 'Resource Gallery'
WHEN 256 THEN 'Resource Gallery Item (images/documents)'
WHEN 16384 THEN 'Template Gallery'
WHEN 65536 THEN 'Template' END as [Type]
,page.CreatedWhen as 'Created'
,page.ModifiedWhen as 'Modified'
,html.PropValue as 'HTML'
FROM dbo.Node page
INNER JOIN dbo.Node folders ON folders.NodeGUID = page.ParentGUID AND folders.Name = 'Folders'
INNER JOIN dbo.NodePlaceholderContent html ON html.NodeId = page.Id AND html.PropName = 'HTML'
INNER JOIN dbo.Node post ON post.FollowGUID = page.NodeGUID AND post.IsShortcut = 1
LEFT JOIN dbo.Node postParent1 ON postParent1.NodeGuid = post.ParentGUID
LEFT JOIN dbo.Node postParent2 ON postParent2.NodeGuid = postParent1.ParentGUID
LEFT JOIN dbo.Node postParent3 ON postParent3.NodeGuid = postParent2.ParentGUID
LEFT JOIN dbo.Node postParent4 ON postParent4.NodeGuid = postParent3.ParentGUID
LEFT JOIN dbo.Node postParent5 ON postParent5.NodeGuid = postParent4.ParentGUID
LEFT JOIN dbo.Node postParent6 ON postParent6.NodeGuid = postParent5.ParentGUID
LEFT JOIN dbo.Node postParent7 ON postParent7.NodeGuid = postParent6.ParentGUID
LEFT JOIN dbo.Node postParent8 ON postParent8.NodeGuid = postParent7.ParentGUID
LEFT JOIN dbo.Node postParent9 ON postParent9.NodeGuid = postParent8.ParentGUID
顺便说一下,我的任务不涉及导出资源库内容(图像/文档/等),但是如果你确实需要那些部分,那么这里应该有足够的信息来获得良好的开端。
我希望这对从MCMS 2002迁移的其他人有所帮助......