如何从Microsoft Content Management Server(MCMS)数据库中提取数据

时间:2011-07-12 13:36:38

标签: asp.net sql-server sitecore mcms-2000 mcms

我需要从Microsoft Content Management Server(MCMS)数据库中提取大量数据(> 1000页),以便在Sitecore网站中使用。

我可以看到两个主要选项:

  1. 将数据迁移到新的简化数据库并显示该数据库 新网站上的信息。

  2. 将MCMS解决方案转换为SharePoint并使用SharePoint 连接器模块可供Sitecore显示此信息。

  3. 我更倾向于沿着第一条路线前进,因为未来没有计划使用SharePoint来管理数据/内容,而是希望将此信息存储在简单的SQL Server数据库中以便更好地搜索。

    我查看了有问题的数据库,并认为我感兴趣的主要表格是NodeNodePlaceholderNodePlaceholderContent,但我很难找到我想要的内容期望。任何人都可以为我提供一些关于这个数据库架构的解释吗?或者我是否会尝试以这种方式迁移数据?

1 个答案:

答案 0 :(得分:6)

我最近刚刚经历了一个从MCMS 2002中导出内容页面的类似过程(迁移到Wordpress)。

我不是说这是获取数据的100%正确方法,但它对我有用。

以下是我从网页中获取网页内容的过程。

正如您已经看到的那样,存储大部分数据的表格为NodeNodePlaceholderContent

1。)要了解Node表的含义,您可以查看按类型组织的内容

SELECT
    [Type]
    ,CASE [Type] 
        WHEN      1 THEN 'Server'
        WHEN      4 THEN 'Channel'
        WHEN     16 THEN 'Post/Page'
        WHEN     64 THEN 'Resource Gallery'
        WHEN    256 THEN 'Resource Gallery Item (images/documents)'
        WHEN  16384 THEN 'Template Gallery'
        WHEN  65536 THEN 'Template' END as [Description]
    ,COUNT([Type]) as [Count]
FROM        dbo.Node
GROUP BY    [Type]
ORDER BY    [Count] DESC

2。)页面(和帖子,将覆盖帖子向下)是type = 16 ...但是为了得到页面(而不是帖子)我们需要按IsShortcut = 0过滤

SELECT * FROM dbo.Node WHERE [Type] = 16 AND IsShortcut = 0

3。)我只想要发布的页面,所以按ApprovalStatus = 1

过滤
-- Get all published pages
SELECT * 
FROM dbo.Node WHERE [Type] = 16 
AND IsShortcut = 0
AND ApprovalStatus = 1 

4.)接下来,确定由(使用用户名)

创建/修改的页面
-- Get published pages & author/editor
SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
FROM        dbo.Node [page]
-- add JOIN on created by user
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
-- add JOIN on modified by user
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
WHERE [Type] = 16 
AND IsShortcut = 0
AND ApprovalStatus = 1 

5.)接下来,使用Node.ParentGUID

找出层次结构中的位置
SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[pageParent].Name -- add page parent Name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- add JOIN on Node using ParentGUID
INNER JOIN  dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

此查询让我知道页面位于名为FoldersArchive Folder的父节点中

6。)上升到另一个级别(获得父级的父级)

SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[pageParent].Name 
    ,[pageParent2].Name -- add parent of parent name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN  dbo.Node [pageParent] ON [pageParent].NodeGUID = [page].ParentGUID
-- add another JOIN on Node using ParentGUID (parent of parent)
INNER JOIN  dbo.Node [pageParent2] ON [pageParent2].NodeGUID = [pageParent].ParentGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

父级的父级是Server(根级别)所以现在我的结论是页面的父级是:

  • Folders - 那是一个活跃的页面
  • Archive Folder - 那是另一页的先前修订版

我只想要活动页面,所以我要加入Folders仅限父母

7。)现在标记怎么样了。在我们的MCMS模板中,只有一个占位符区域。如果模板中有多个占位符区域,NodePlaceholder表将标识占位符的名称。我只是为了简单而加入NodePlaceholdercontent

SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    /* remove parent names */
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
    ,html.PropValue as 'HTML' -- add the markup
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
-- change alias to "folders"
INNER JOIN  dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
-- join on PlaceholderContent to get the HTML
-- this table will also have references to any static files contained in the page (such as images) so we filter those out by PropName = 'HTML'
INNER JOIN  dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML' 
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

8。)所以在这一点上我试图确定页面在系统中的位置(即相对路径或它所处的通道),然后回到步骤1& 2,type = 16可以是一个帖子或一个页面(它们不是同一个东西,但它们是相关的)。所以现在我们将页面加入到帖子记录中以确定路径。

经过一些谷歌搜索,我偶然发现this excerpt from Microsoft Content Management Server 2002: a complete guide确实有助于完成剩下的工作(并确定了Node.Type枚举)

SELECT 
    [page].Id
    ,[page].NodeGuid
    ,[page].Name
    ,[post].DisplayName as 'Title' -- add page Title from the post record
    ,[pageParent].Name 
    ,[pageParent2].Name
    ,[created].Username as 'CreatedBy'
    ,[page].CreatedWhen
    ,[modified].Username as 'ModifiedBy'
    ,[page].ModifiedWhen
    ,html.PropValue as 'HTML'
FROM        dbo.Node [page]
INNER JOIN  dbo.ClientAccount [created] ON [created].UserId = [page].CreatedByUserId
INNER JOIN  dbo.ClientAccount [modified] ON [modified].UserId = [page].ModifiedByUserId
INNER JOIN  dbo.Node [folders] ON [folders].NodeGUID = [page].ParentGUID AND [folders].Name = 'Folders'
INNER JOIN  dbo.NodePlaceholderContent html ON html.NodeId = [page].Id AND html.PropName = 'HTML' 
-- join using followGUID to get the posting
INNER JOIN  dbo.Node [post] ON [post].FollowGUID = [page].NodeGUID
WHERE [page].[Type] = 16
AND [page].IsShortcut = 0
AND [page].ApprovalStatus = 1 

9。)现在最后一步是继续上升后父级层次结构,导致几个LEFT JOINS加强ParentGUID链。此查询使用这些LEFT JOINS提供层次结构的直观表示。

SELECT 
    CASE WHEN postParent9.Name IS NULL THEN '' ELSE postParent9.Name + ' > ' END +
    CASE WHEN postParent8.Name IS NULL THEN '' ELSE postParent8.Name + ' > ' END +
    CASE WHEN postParent7.Name IS NULL THEN '' ELSE postParent7.Name + ' > ' END +
    CASE WHEN postParent6.Name IS NULL THEN '' ELSE postParent6.Name + ' > ' END +
    CASE WHEN postParent5.Name IS NULL THEN '' ELSE postParent5.Name + ' > ' END +
    CASE WHEN postParent4.Name IS NULL THEN '' ELSE postParent4.Name + ' > ' END +
    CASE WHEN postParent3.Name IS NULL THEN '' ELSE postParent3.Name + ' > ' END +
    CASE WHEN postParent2.Name IS NULL THEN '' ELSE postParent2.Name + ' > ' END +
    CASE WHEN postParent1.Name IS NULL THEN '' ELSE postParent1.Name + ' > ' END +
    page.Name as [Path]
    ,page.Name + '.htm' as [PageName]
    ,post.DisplayName as [PageTitle]
    ,CASE page.[Type] 
        WHEN      1 THEN 'Server'
        WHEN      4 THEN 'Channel'
        WHEN     16 THEN 'Post/Page'
        WHEN     64 THEN 'Resource Gallery'
        WHEN    256 THEN 'Resource Gallery Item (images/documents)'
        WHEN  16384 THEN 'Template Gallery'
        WHEN  65536 THEN 'Template' END as [Type]
    ,page.CreatedWhen as 'Created'
    ,page.ModifiedWhen as 'Modified'
    ,html.PropValue as 'HTML'
FROM        dbo.Node page
INNER JOIN  dbo.Node folders ON folders.NodeGUID = page.ParentGUID AND folders.Name = 'Folders'
INNER JOIN  dbo.NodePlaceholderContent html ON html.NodeId = page.Id AND html.PropName = 'HTML'
INNER JOIN  dbo.Node post ON post.FollowGUID = page.NodeGUID AND post.IsShortcut = 1
LEFT JOIN   dbo.Node postParent1 ON postParent1.NodeGuid = post.ParentGUID
LEFT JOIN   dbo.Node postParent2 ON postParent2.NodeGuid = postParent1.ParentGUID
LEFT JOIN   dbo.Node postParent3 ON postParent3.NodeGuid = postParent2.ParentGUID
LEFT JOIN   dbo.Node postParent4 ON postParent4.NodeGuid = postParent3.ParentGUID
LEFT JOIN   dbo.Node postParent5 ON postParent5.NodeGuid = postParent4.ParentGUID
LEFT JOIN   dbo.Node postParent6 ON postParent6.NodeGuid = postParent5.ParentGUID
LEFT JOIN   dbo.Node postParent7 ON postParent7.NodeGuid = postParent6.ParentGUID
LEFT JOIN   dbo.Node postParent8 ON postParent8.NodeGuid = postParent7.ParentGUID
LEFT JOIN   dbo.Node postParent9 ON postParent9.NodeGuid = postParent8.ParentGUID

顺便说一下,我的任务不涉及导出资源库内容(图像/文档/等),但是如果你确实需要那些部分,那么这里应该有足够的信息来获得良好的开端。

我希望这对从MCMS 2002迁移的其他人有所帮助......