一维数æ®é›†çš„空间分区算法?

时间:2016-10-10 02:58:17

标签: cluster-analysis data-mining partitioning dbscan bigdata

Area divided into 10000 squares and data set has traffic vol per sq

这是Grid,代表10,000个方格的地ç†åŒºåŸŸï¼Œæ¯ä¸ªæ–¹æ ¼ä¸º55225平方米。

æ•°æ®é›†çš„æ¯å¹³æ–¹ç§»åŠ¨é‡ä¸º100到1000。

代表:

æ–¹å—1 - 100,

æ–¹å—2 - 500

方形10,000 - 800

现在,我想以这样一ç§æ–¹å¼å¯¹è¿™ä¸ªåŒºåŸŸè¿›è¡Œåˆ†åŒºï¼Œå³æ¯ä¸ªåˆ†åŒºå¯èƒ½æœ‰ä¸åŒçš„区域但是会æºå¸¦ç›¸ä¼¼æ•°é‡çš„æµé‡ï¼Œåˆ†åŒºä¹‹é—´çš„æµé‡æ ‡å‡†å·®åº”该是最å°çš„。对空间分区算法有什么建议å—?

1 个答案:

答案 0 :(得分:0)

您必须åšå‡ºä¸€äº›å†³å®šæ‰èƒ½é€šçŸ¥æ‚¨çš„程åºã€‚想到的first question是å¦å®šä¹‰äº†åˆ†åŒºæ•°ï¼Ÿ second question是å¦å¯¹ç»„有任何几何é™åˆ¶ï¼Œå³å®ƒä»¬å¿…须是连续的,还是任何特定的形状都是ç†æƒ³çš„? third question是关于多好还好å—?算法的è¿è¡Œæ—¶é—´é€šå¸¸å­˜åœ¨å·¨å¤§å·®å¼‚,该算法æ供了ç†æƒ³çš„答案(å¯èƒ½æ˜¯è´ªå©ªçš„算法)和æ供最佳答案的算法(å¯èƒ½æ˜¯è¯¦å°½çš„或蛮力的#34;方法)。通过将具有相åŒä½“积的任何2个扇区分组,您将获得最å°æ ‡å‡†å差,因为您的组将å„自具有0个标准å差。无论如何,这å¬èµ·æ¥å¾ˆåƒå±•å¼€bin packing problem,你应该在那里开始你的文献综述。

你需è¦æŒ‰é¡ºåºæ”¶æ‹¾åžƒåœ¾ç®±......

在这里,我选择了我的圈å­çš„中心点,这些圈å­åå‘于最高的交通æµé‡å¹¶ä»Žé‚£é‡Œå¡«å……。

class trafficNode:
    def __init__(self,v,i):
        self.cluster = None
        self.value = v
        self.index = i
        self.occupied = False
    def occupy(self):
        self.occupied=True

def tryAdd(xList,mList,irow,icol):
    try:
        if not(mList[irow][icol] in xList and !mList[irow][icol].occupied):
            xlist.append(mList[irow][icol])
    except IndexError:
        chill = None
    return(xlist)

class cluster:
    def __init__(self):
        self.nodes = []
    def getTotal(self):
        total = 0
        for k in self.nodes:
            total += k.value
        return(total)
    def addNode(self,n):
        self.nodes.append(n)
    def getNeighbors(self,m,r = 0):
        neighbors = []
        for k in self.nodes:
            i = k.index()
            for k2 in range(0,4):
                if k2==0:
                    neighbors = tryAdd(neighbors,m,i[0]+0,i[1]+1)
                elif k2==1:
                    neighbors = tryAdd(neighbors,m,i[0]+1,i[1]+0)
                elif k2==2:
                    neighbors = tryAdd(neighbors,m,i[0]+0,i[1]-1)
                elif k2==3:
                    neighbors = tryAdd(neighbors,m,i[0]-1,i[1]+0)
                if r != 0:
                    if k2==0:
                        neighbors = tryAdd(neighbors,m,i[0]+1,i[1]+1)
                    elif k2==1:
                        neighbors = tryAdd(neighbors,m,i[0]+1,i[1]-1)
                    elif k2==2:
                        neighbors = tryAdd(neighbors,m,i[0]-1,i[1]+1)
                    elif k2==3:
                        neighbors = tryAdd(neighbors,m,i[0]-1,i[1]-1)
        return(neighbors)
    def seed(self,m,irow,icol):
        self.nodes.append(m[irow][icol])
        m[irow][icol].occupy()
    def propogate(self,m,target):
        total = 0
        for k in self.nodes:
            total += k.value
        s = 1
        while total<target:
            s = 1 if !s else 0
            lastTotal=Total
            n = self.getNeighbors(m,s)
            if len(n==0):
                break;
            else:
                if(abs(target-(total+sum([k.value for k in n])))<abs(target-total)):
                    for k in n:
                        self.nodes.append(k)
                        m[k.index[0]][k.index[1]].occupy()
                else:
                    break;
    def contains(self,i):
        ret = False
        for k in self.nodes 
            if k.index == i
                ret = False
                break;
        return(ret)

def parseData(d,s): # Where d is the source datafile and s is the number of units per row.
    ret = []
    f = open(d,"r")
    text = f.read()
    lines = f.split("\n")
    n = 0
    r = 0
    temp = []
    for k in lines:
        v = k.split(" - ")[1]
        n+=1
        temp.append(trafficNode(v,(r,n)))
        if n == s:
            n = 0
            r += 1
            ret.append(temp)
            temp = []
    return(ret)

def mapTotal(m):
    return sum([sum([k2.value for k2 in k]) for k in m])

def pTotal(m,n):
    return(mapTotal/n)

import sys

infile = sys.argv[1]
ncols = sys.argv[2]
ntowers = sys.argv[3]
m = parseData(infile,ncols)
s = pTotal(m,ntowers)

spots = [k.index for k in m if !k.occupied]
clusters = []
while len(spots > 0):
    spotVals = [m[k[0]][k[1]].value for k in spots]
    nextSpotIndex = spots[spotVals.index(max(spotVals))]
    clusters.append(cluster)
    clusters[n].seed(self,m,nextSpotIndex[0],nextSpotIndex[1])
    clusters[n].propogate(m,s)
    spots = [k.index for k in m if !k.occupied]

那说我还没有测试过它......ä½ çš„æ•°æ®æ˜¯ä½œä¸ºé‚£ä¸ªå›¾åƒè¿˜æ˜¯å¦ä¸€ä¸ªæ–‡ä»¶ï¼Ÿ