Orientdb慢导入大数据集如何让它更快?

时间:2017-07-06 20:18:05

标签: performance time orientdb bigdata

我正在使用17M边缘和20K vrtice的网络,并且我使用ETL工具将其加载到Orientdb中,但它需要永远加载。

我尝试了从1000到100000不等的批次,但仍然没有变化。

是否有一种优化的方法可以加快加载速度?其他使用Java API的 任何帮助,将不胜感激。 我使用2.2.20社区版。 这是ETL导入:

<DataGrid ItemsSource="{Binding MonthlyRevenues}" AutoGenerateColumns="False" >
        <DataGrid.Columns>
            <DataGridTextColumn Header="Day" Binding="{Binding Path=Day}" />
            <DataGridTextColumn Header="{Binding Path=MonthlyRevenues[0].DepartmentList[0].Name}" Binding="{Binding Path=DepartmentList[0].Total, Mode=TwoWay}" />
            <DataGridTextColumn Header="{Binding Path=DepartmentList[1].Name}" Binding="{Binding Path=DepartmentList[1].Total, Mode=TwoWay}" />
            <DataGridTextColumn Header="Department Total"/>
            <DataGridTextColumn Header="Cash Total" />
            <DataGridTextColumn Header="Credit Total" />
        </DataGrid.Columns>
    </DataGrid>

基于[1]:orientdb load graph csv of nodes and edges 加载相同的脚本两次以导入2个顶点,并加载另一个ETL以加载边缘。

{
    "source": { "file": { "path": "C:/Users/Muuna/Desktop/files/H.csv" } },
    "extractor": { "csv": {
        "separator": ",",
        "columnsOnFirstLine": true,
                "ignoreEmptyLines": true,
    "columns": ["id:Integer","p1:String","p2:String","s:Integer"] } },
    "transformers": [

        { "command": { "command": "UPDATE H set p='${input.p1}' UPSERT WHERE p='${input.p1}'"},"vertex": { "class": "H", "skipDuplicates": true} }      
    ],
    "loader": {
        "orientdb": {
            "dbURL": "PLOCAL:C:/orientdb/databases/Graph",
            "dbUser": "admin",
            "dbPassword": "admin",
            "dbType": "graph",
            "classes": [
                {"name": "H", "extends": "V"},
                {"name": "HAS_S", "extends": "E"}
            ],"indexes": [ {"class":"H", "fields":["p:String"], "type":"UNIQUE" }]
        }
    }
}

基于[参考] [1]

Edges .

0 个答案:

没有答案