Question

任何人都可以向我解释，如何继续下面的情景？

接收文件（MS docs，ODS，PDF）
通过使用jackrabbit-content-extractors的Apache Tika +内容提取进行公共核心元数据提取
使用Jackrabbit将文档（内容）与元数据一起存储到存储库？
检索文档+元数据

我对第3点和第4点感兴趣...

详情：该应用程序以交互方式处理文档（一些分析 - 语言检测，字数统计等+收集尽可能多的细节 - 都柏林核心+解析内容/事件处理），以便将处理结果返回给用户，然后提取内容和元数据（提取的和自定义的用户元数据）存储到JCR存储库

感谢任何帮助，谢谢

Answer 1

JCR 2.0的上传文件与JCR 1.0的文件基本相同。但是，JCR 2.0添加了一些有用的其他内置属性定义。

“nt：file”节点类型旨在表示一个文件，并在JCR 2.0中有两个内置属性定义（两者都是在创建节点时由存储库自动创建的）：

jcr：created（DATE）
jcr：createdBy（STRING）

并定义一个名为“jcr：content”的子项。该“jcr：content”节点可以是任何节点类型，但一般而言，与该内容本身有关的所有信息都存储在该子节点上。事实上的标准是使用“nt：resource”节点类型，它定义了以下属性：

jcr：数据（BINARY）必填
jcr：lastModified（DATE）autocreated
jcr：lastModifiedBy（STRING）autocreated
jcr：mimeType（STRING）受保护？
jcr：编码（STRING）受保护？

请注意，JCR 2.0中添加了“jcr：mimeType”和“jcr：encoding”。

特别是，“jcr：mimeType”属性的目的是完全按照您的要求执行 - 捕获内容的“类型”。但是，“jcr：mimeType”和“jcr：encoding”属性定义可以（通过JCR实现）定义为受保护（意味着JCR实现自动设置它们） - 如果是这种情况，则不允许手动设置这些属性。我相信Jackrabbit和ModeShape不会将这些视为受保护。

以下是一些代码，展示了如何使用这些内置节点类型将文件上传到JCR 2.0存储库：

// Get an input stream for the file ...
File file = ...
InputStream stream = new BufferedInputStream(new FileInputStream(file));

Node folder = session.getNode("/absolute/path/to/folder/node");
Node file = folder.addNode("Article.pdf","nt:file");
Node content = file.addNode("jcr:content","nt:resource");
Binary binary = session.getValueFactory().createBinary(stream);
content.setProperty("jcr:data",binary);

如果JCR实现不将“jcr：mimeType”属性视为受保护（即Jackrabbit和ModeShape），则必须手动设置此属性：

content.setProperty("jcr:mimeType","application/pdf");

元数据可以很容易地存储在“nt：file”和“jcr：content”节点上，但是开箱即用的“nt：file”和“nt：resource”节点类型不允许额外的属性。因此，在添加其他属性之前，首先需要添加一个mixin（或多个mixins），它们具有要存储的属性类型的属性定义。你甚至可以定义一个允许任何属性的mixin。这是一个定义这样一个mixin的CND文件：

<custom = 'http://example.com/mydomain'>
[custom:extensible] mixin
- * (undefined) multiple 
- * (undefined)

注册此节点类型定义后，您可以在节点上使用它：

content.addMixin("custom:extensible");
content.setProperty("anyProp","some value");
content.setProperty("custom:otherProp","some other value");

您还可以定义和使用允许任何Dublin Core element：

的mixin

<dc = 'http://purl.org/dc/elements/1.1/'>
[dc:metadata] mixin
- dc:contributor (STRING)
- dc:coverage (STRING)
- dc:creator (STRING)
- dc:date (DATE)
- dc:description (STRING)
- dc:format (STRING)
- dc:identifier (STRING)
- dc:language (STRING)
- dc:publisher (STRING)
- dc:relation (STRING)
- dc:right (STRING)
- dc:source (STRING)
- dc:subject (STRING)
- dc:title (STRING)
- dc:type (STRING)

所有这些属性都是可选的，并且此mixin不允许任何名称或类型的属性。我还没有真正使用此解决“DC：元数据”混入，其中有些已经与内置属性（例如，“JCR：createBy”所代表的事实，“JCR：lastModifiedBy”，“JCR：创建” ，“jcr：lastModified”，“jcr：mimeType”），其中一些可能与内容更相关，而另一些则与文件更相关。

您当然可以根据需要使用继承来定义更适合您的元数据需求的其他mixin。但是使用继承与混入要小心 - 因为JCR允许多个混入一个节点，它往往是最好的设计你的混入被限定范围和面向小平面（例如，“例如：加标签”，“EX：描”等）然后根据需要简单地将适当的mixins应用于节点。

（甚至可能更复杂，定义一个mixin，允许更多的子节点在“nt：file”节点下，并在那里存储一些元数据。）

Mixins非常棒，为您的JCR内容提供了极大的灵活性和强大功能。

哦，当您创建了所需的所有节点时，请务必保存会话：

session.save();

Answer 2

我对JCR有点生疏，我从未使用过2.0，但这应该让你开始。

见link。你想打开第二条评论。

您只需将文件存储在节点中，并向节点添加其他元数据。以下是存储文件的方法：

Node folder = session.getRootNode().getNode("path/to/file/uploads"); 
Node file = folder.addNode(fileName, "nt:file"); 
Node fileContent = file.addNode("jcr:content"); 
fileContent.setProperty("jcr:data", fileStream);
// Add other metadata
session.save();

如何存储元数据取决于您。一种简单的方法是只存储键值对：

fileContent.setProperty(key, value, PropertyType.STRING);

要阅读您只需拨打getProperty()的数据。

fileStream = fileContent.getProperty("jcr:data");
value = fileContent.getProperty(key);

Answer 3

我是Jackrabbit的新手，正在研究2.4.2。至于您的解决方案，您可以使用核心Java逻辑检查类型，并在案例中放置定义任何变体的案例。

您不必担心将不同.txt或.pdf的内容保存为其问题内容转换为二进制并保存。这是一个小样本，我在其中上传并从jackrabbit repo中下载了pdf文件。

    // Import the pdf file unless already imported 
            // This program is for sample purpose only so everything is hard coded.
        if (!root.hasNode("Alfresco_E0_Training.pdf"))
        { 
            System.out.print("Importing PDF... "); 

            // Create an unstructured node under which to import the XML 
            //Node node = root.addNode("importxml", "nt:unstructured"); 
            Node file = root.addNode("Alfresco_E0_Training.pdf","nt:file");

            // Import the file "Alfresco_E0_Training.pdf" under the created node 
            FileInputStream stream = new FileInputStream("<path of file>\\Alfresco_E0_Training.pdf");
            Node content = file.addNode("jcr:content","nt:resource");
            Binary binary = session.getValueFactory().createBinary(stream);
            content.setProperty("jcr:data",binary);
            stream.close();
            session.save(); 
            //System.out.println("done."); 
            System.out.println("::::::::::::::::::::Checking content of the node:::::::::::::::::::::::::");
            System.out.println("File Node Name : "+file.getName());
            System.out.println("File Node Identifier : "+file.getIdentifier());
            System.out.println("File Node child : "+file.JCR_CHILD_NODE_DEFINITION);
            System.out.println("Content Node Name : "+content.getName());
            System.out.println("Content Node Identifier : "+content.getIdentifier());
            System.out.println("Content Node Content : "+content.getProperty("jcr:data"));
            System.out.println(":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");

        }else
        {
            session.save();
            Node file = root.getNode("Alfresco_E0_Training.pdf");
            Node content = file.getNode("jcr:content");
            String path = content.getPath();
            Binary bin = session.getNode(path).getProperty("jcr:data").getBinary();
            InputStream stream = bin.getStream();
             File f=new File("C:<path of the output file>\\Alfresco_E0_Training.pdf");

              OutputStream out=new FileOutputStream(f);
              byte buf[]=new byte[1024];
              int len;
              while((len=stream.read(buf))>0)
              out.write(buf,0,len);
              out.close();
              stream.close();
              System.out.println("\nFile is created...................................");


            System.out.println("done."); 
            System.out.println("::::::::::::::::::::Checking content of the node:::::::::::::::::::::::::");
            System.out.println("File Node Name : "+file.getName());
            System.out.println("File Node Identifier : "+file.getIdentifier());
            //System.out.println("File Node child : "+file.JCR_CHILD_NODE_DEFINITION);
            System.out.println("Content Node Name : "+content.getName());
            System.out.println("Content Node Identifier : "+content.getIdentifier());
            System.out.println("Content Node Content : "+content.getProperty("jcr:data"));
            System.out.println(":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");
        } 

        //output the repository content
        } 
    catch (IOException e){
        System.out.println("Exception: "+e);
    }
    finally { 
        session.logout(); 
        } 
        } 
}

希望这有帮助

将元数据存储到Jackrabbit存储库中

3 个答案: