如何在gremlin中计算传入的顶点?

时间:2019-04-15 11:29:26

标签: gremlin

我有这个数据库:

客户=>事件=>文件=>文件名

客户端具有ID事件具有ID和reportOn属性文件具有ID和fileSize,mimeType,恶意软件属性文件名具有ID客户端具有事件的外发边缘(已报告),事件具有文件的外发边缘( containsFile),文件的文件名(hasName)带有传出的边。

以下是一些示例数据:

g.addV('client').property('id','1').as('1').
  addV('incident').property('id','11').property('reportedON', '2/15/2019 8:01:19 AM').as('11').
  addV('file').property('id','100').property('fileSize', '432534').property('malwareSource', 'malware').as('100').
  addV('fileName').property('id','file.pdf').as('file.pdf').
  addE('reported').from('1').to('11').
  addE('containsFile').from('11').to('100').
  addE('hasName').from('100').to('file.pdf').iterate()

我正在执行以下查询:

g.V().has('malwareSource', 'malware').as('FILE').out('hasName').as('FILENAME').select('FILE').in('containsFile').as('INCIDENT').select('FILE').valueMap().as('FILEVALUES').select('INCIDENT').valueMap().as('INCIDENTVALUES').select('FILE', 'FILEVALUES', 'FILENAME', 'INCIDENTVALUES')

如何计算每个属性为“恶意软件”的文件有多少个进入的顶点?

1 个答案:

答案 0 :(得分:0)

You really should use project() - the code is so much more readable as shown in a separate question you had here:

gremlin> g.V().has('malwareSource', 'malware').
......1>   project('FILE', 'FILENAME', 'FILEVALUES', 'INCIDENTVALUES').
......2>     by().
......3>     by(out('hasName')).
......4>     by(valueMap()).
......5>     by(__.in('containsFile').valueMap().fold())
==>[FILE:v[5],FILENAME:v[9],FILEVALUES:[fileSize:[432534],malwareSource:[malware],id:[100]],INCIDENTVALUES:[[reportedON:[2/15/2019 8:01:19 AM],id:[11]]]]

much easier to follow, though I still don't understand why you require this returned data structure as it repeats data in the result for "FILE" and "FILEVALUES". Well, that aside, you can see how easy it is to get the count of incoming edges...it's just adding an extra key to project() and an extra by() modulator to do the count():

gremlin> g.V().has('malwareSource', 'malware').
......1>   project('FILE', 'FILENAME', 'FILEVALUES', 'INCIDENTVALUES', 'COUNT').
......2>     by().
......3>     by(out('hasName')).
......4>     by(valueMap()).
......5>     by(__.in('containsFile').valueMap().fold()).
......6>     by(__.in().count())
==>[FILE:v[5],FILENAME:v[9],FILEVALUES:[fileSize:[432534],malwareSource:[malware],id:[100]],INCIDENTVALUES:[[reportedON:[2/15/2019 8:01:19 AM],id:[11]]],COUNT:1]

You could probably figure out how to do lines 5 and 6 one time to avoid dual iteration, but I would probably try to optimize that as a separate issue and consider adjusting your returned data structure to allow for it.