在python中组合时拆分文本文档

时间:2017-10-23 14:28:36

标签: python python-2.7 python-3.x

我有两个文本文件,一个文件包含Neo4j脚本,另一个包含具有一些文档ID和索引的国家和城市列表。如下所示:

Cypher文件:

MATCH (t:Country {name:'%a'}),(o:City {name:'%b'})
WITH point({ longitude: toFloat(t.longitude), latitude: toFloat(t.latitude) }) AS copoint, point({ longitude: toFloat(o.longitude), latitude: toFloat(o.latitude) }) AS cipoint
RETURN distance(copoint, cipoint)

文字档案:

     5  <DOCID>GH950102-000000<DOCID>/O
   114  Cardiff/LOCATION
   321  United States'/LOCATION
   898  Alps/LOCATION
  1029  Dresden/LOCATION
  1150  Scotland/LOCATION
  1162  Gasforth/LOCATION
  1258  Arabia/LOCATION
  1261  Hejaz/LOCATION
  1265  Aleppo/LOCATION
  1267  Northern Syria/LOCATION
  1269  Aqaba/LOCATION
  1271  Jordan./LOCATION
  1543  London/LOCATION
  1556  London/LOCATION
  1609  London/LOCATION
  2040  <DOCID>GH950102-000001<DOCID>/O
  2317  America/LOCATION
  3096  New York./LOCATION
  3131  Great Britain/LOCATION
  3147  <DOCID>GH950102-000002<DOCID>/O
  3184  Edinburgh/LOCATION
  3210  <DOCID>GH950102-000003<DOCID>/O
  3243  Australia/LOCATION
  3360  England/LOCATION
  3414  India/LOCATION
  3474  Melbourne/LOCATION
  3497  England/LOCATION

我的问题是如何在出现DOCID时拆分此文档,并在每个DOCID之间的所有位置名称之间进行组合。在Cypher脚本中复制位置名称时,应删除Index Number并删除/location

我尝试使用此代码,但它没有帮助。

from itertools import combinations

with open ("results.txt") as f:
    for line in f:
        for "DOCID" in line.split():
            cities = (city.strip() for city in f.readlines())

with open ("cypher.txt") as g:
    cypher_query =g.readlines()

with open ("resultfile.txt","w") as f:
    for city1,city2 in combinations (cities,2):
        f.writelines(line.replace("%a",city1).replace("%b",city2) for line in cypher_query)
        f.write("\n")

1 个答案:

答案 0 :(得分:0)

我不知道密码,所以你可能必须自己适应它,但这会给你组合:

import re
import itertools

with open ("cypher.txt") as g:
    cypher_query =g.readlines()

with open("textFile", "r") as inputFile:

    locations = set()

    for line in inputFile:
        if "DOCID" in line and len(locations) > 1:

            for city1, city2 in itertools.combinations(locations,2):


                #
                # here call cypher script with cities as parameter
                #

                with open ("resultfile.txt","a") as f:
                        f.writelines(line.replace("%a",city1.strip()).replace("%b",city2.strip()) for line in cypher_query)
                        f.write("\n")

            locations.clear()

        else:
            location = re.search("(\D+)/LOCATION$", line)
            if location:
                locations.add(location.group(1))

编辑:修改了一行,现在为每个2位组合生成一个带有1个cypher命令的文件,如果你想要单独的文件,添加一个计数器或类似于resultfile-filename。还要注意有像乔丹这样的名字。 (最后)如果这有任何区别。

示例输出:

MATCH (t:Country {name:'Alps'}),(o:City {name:'Scotland'})
WITH point({ longitude: toFloat(t.longitude), latitude: toFloat(t.latitude) }) AS copoint, point({ longitude: toFloat(o.longitude), latitude: toFloat(o.latitude) }) AS cipoint
RETURN distance(copoint, cipoint)

MATCH (t:Country {name:'Alps'}),(o:City {name:'Dresden'})
WITH point({ longitude: toFloat(t.longitude), latitude: toFloat(t.latitude) }) AS copoint, point({ longitude: toFloat(o.longitude), latitude: toFloat(o.latitude) }) AS cipoint
RETURN distance(copoint, cipoint)

MATCH (t:Country {name:'Alps'}),(o:City {name:'Gasforth'})
WITH point({ longitude: toFloat(t.longitude), latitude: toFloat(t.latitude) }) AS copoint, point({ longitude: toFloat(o.longitude), latitude: toFloat(o.latitude) }) AS cipoint
RETURN distance(copoint, cipoint)