创建具有不同平均度但具有相同传递性的图形

时间:2017-08-27 11:49:10

标签: r igraph

我希望创建一系列具有固定数量节点(比如n = 100)的图形,具有不同的平均度,但在网络中具有(几乎)相同的传递性。

目前,我能想到的最好的是

require(igraph)

g1 <- sample_pa(n = 100, power = 0.5, m = 2) # degree 3.94, transitivity 0.075
g2 <- sample_pa(n = 100, power = 0.5, m = 4) # degree 7.8, transitivity 0.135
g3 <- sample_pa(n = 100, power = 0.5, m = 4) # degree 11.58, transitivity 0.214

等等。然而,传递性正在发生很大变化,我正在寻找一个内置函数或一个简单的算法来创建这样的图形。

1 个答案:

答案 0 :(得分:0)

有几种方法可以重新连接图形以实现给定的传递性/全局聚类系数(GCC如下):

1)天真的方法

在每次迭代时,边缘的随机子集重新连接 保留图的度序列。如果是全局聚类 重新连接图的系数更接近目标值,重新连接的图保持在下一次迭代中。当GCC足够接近目标时,或者当达到最大迭代次数时,算法会停止。

2)Bansal等人。 2009年

在每次迭代中,由5个节点和4个边组成的图案是随机的 从所有这些图案的集合中选择,并且两个外边缘是 重新制作一个新的候选图表,G&#39;,具有相同的程度 序列为G:

                       x                       x
                      / \                     / \
                    y1   y2     <=====>     y1---y2
                    |    |
                    z1   z2                 z1---z2

重新布线步骤可以引入新的三角形或分解 现有GCC取决于当前GCC是大于还是小 比目标值。如果重新布线使GCC更接近目标 价值然后G&#39;用于下一次迭代。算法停止 当GCC足够接近目标值时,或当最大值时 达到迭代次数。

代码

我在python工作,所以我没有在R中使用这些例程。但我相信你可以解决这个问题:

import numpy as np
import igraph
from itertools import izip, combinations, product
from random import Random


def bansal_shuffle(G, target_gcc, tol=1E-3, maxiter=None, inplace=True,
                   require_connected=False, seed=None, verbose=False):
    r"""
    Bansal et al's Markov chain method for generating random graphs with
    prescribed global clustering coefficients.

    On each iteration a motif consisting of 5 nodes and 4 edges is randomly
    selected from the set of all such motifs, and two of the outer edges are
    rewired to produce a new candidate graph, G', which has the same degree
    sequence as G:
                           x                       x
                          / \                     / \
                        y1   y2     <=====>     y1---y2
                        |    |
                        z1   z2                 z1---z2

    The rewiring step can either introduce a new triangle or break up an
    existing one depending on whether the current GCC is greater or smaller
    than the target value. If rewiring brings the GCC closer to the target
    value then G' is used in the next iteration. The algorithm halts either
    when the GCC is sufficiently close to the target value, or when a maximum
    number of iterations is reached.

    Arguments:
    ----------
        G: igraph.Graph
            input graph to be shuffled; can be either undirected or directed
        target_gcc: float
            target global clustering coefficient
        tol: float
            halting tolerance: abs(gcc - target_gcc) < tol
        maxiter: int
            maximum number of rewiring steps to perform (default is no limit)
        inplace: bool
            modify G in place
        require_connected: bool
            impose the additional constraint that G' must be connected on each
            rewiring step, in which case the input graph must also be connected
        seed: int
            seed for the random number generator, see numpy.random.RandomState
        verbose: bool
            print convergence messages

    Returns:
    --------
        G_shuf: igraph.Graph
            shuffled graph, has the same same degree sequence(s) as G
        niter: int
            total rewiring iterations performed
        gcc_best: float
            final global clustering coefficient

    Reference:
    ----------
    Bansal, S., Khandelwal, S., & Meyers, L. A. (2009). Exploring biological
    network structure with clustered random networks. BMC Bioinformatics, 10,
    405. doi:10.1186/1471-2105-10-405

    """

    if maxiter is None:
        maxiter = np.inf

    if require_connected:
        if not G.is_connected():
            raise ValueError(
                'if require_connected, the input graph must also be connected')

    if not inplace:
        G = G.copy()

    # seed RNG
    gen = np.random.RandomState(seed)

    # get initial GCC and loss
    gcc_initial = G.transitivity_undirected()
    gcc = gcc_best = gcc_initial
    loss = loss_best = target_gcc - gcc

    # if we're already close enough, return immediately
    if abs(loss_best) < tol:
        return G, 0, gcc

    # find all nodes with degrees >= 2 (ignoring direction). each of these must
    # be part of at least one triplet motif
    degrees = G.degree()
    candidate_x = _filter_by_degree(range(G.vcount()), degrees, 2)

    niter = 0
    terminating = False

    while not terminating:

        # for moderately-sized graphs it's faster to modify a deep copy of G
        # than to undo the rewiring whenever the GCC does not improve
        G_prime = G.copy()

        if loss > 0:
           _make_triangle(G_prime, candidate_x, gen)

        elif loss < 0:
            _break_triangle(G_prime, candidate_x, gen)

        # compute the new clustering coefficient and loss
        gcc = G_prime.transitivity_undirected()
        loss = target_gcc - gcc

        # did we improve?
        improved = abs(loss) < abs(loss_best)

        # if desired, we also confirm that the new graph is connected
        if require_connected:
            improved &= G_prime.is_connected()

        if improved:
            gcc_best = gcc
            loss_best = loss
            G = G_prime

        # print progress
        if verbose:
            print("iter=%-6i GCC=%-8.3g loss=%-8.3g improved=%-5s"
                  % (niter, gcc, loss, improved))

        niter += 1

        # check termination conditions
        if abs(loss_best) < tol:
            terminating = True
        elif niter >= maxiter:
            print('failed to reach targed GCC within maxiter')
            terminating = True

    return G, niter, gcc_best


def _make_triangle(G, candidate_x, gen):

    have_rewire_candidates = False

    # get candidate edges to rewire
    while not have_rewire_candidates:

        # a random node with a degree of at least 2
        x = gen.choice(candidate_x)

        # find all neighbors of this node with a degree of at least 2 (ignoring
        # direction)
        x_neighbors = G.neighbors(x)
        x_valid_neighbors = _filter_by_degree(
            x_neighbors, G.degree(x_neighbors), 2
        )

        # skip this candidate if there aren't at least two neighbors with
        # degrees >= 2
        if len(x_valid_neighbors) < 2:
            continue

        else:

            # shuffle the valid neighbors of x, iterate over every possible
            # pair
            gen.shuffle(x_valid_neighbors)

            for (y1, y2) in combinations(x_valid_neighbors, 2):

                # if y1 and y2 are already connected, skip them
                if _any_direction(G, y1, y2):
                    continue

                # find all outputs from y1
                from_y1 = G.neighbors(y1, igraph.OUT)

                # find all inputs to y2
                to_y2 = G.neighbors(y2, igraph.IN)

                # shuffle them
                gen.shuffle(from_y1)
                gen.shuffle(to_y2)

                # iterate over all possible pairs consisting of one output from
                # y1 and one input to y2.
                for z1, z2 in product(from_y1, to_y2):

                    # we're only interested in unconnected pairs
                    if _any_direction(G, z1, z2):
                        continue

                    # must satisfy z1 != x; z2 != x; z1 != z2
                    elif x not in (z1, z2) and z1 != z2:
                        have_rewire_candidates = True
                        break

                if have_rewire_candidates:
                    break

    # perform double edge swap
    G.delete_edges([(y1, z1), (z2, y2)])
    G.add_edges([(y1, y2), (z2, z1)])

    pass


def _break_triangle(G, candidate_x, gen):

    have_rewire_candidates = False
    n_edges = G.ecount()

    # get candidate edges to rewire
    while not have_rewire_candidates:

        # a random node with a degree of at least 2
        x = gen.choice(candidate_x)

        # find all neighbors of this node with a degree of at least 2 (ignoring
        # direction)
        x_neighbors = G.neighbors(x)
        x_valid_neighbors = _filter_by_degree(
            x_neighbors, G.degree(x_neighbors), 2
        )

        # skip this candidate if there aren't at least two neighbors with
        # degrees >= 2
        if len(x_valid_neighbors) < 2:
            continue

        else:

            # shuffle the valid neighbors of x, iterate over every possible
            # pair
            gen.shuffle(x_valid_neighbors)

            for (y1, y2) in combinations(x_valid_neighbors, 2):

                # if a connection exist from y1 --> y2...
                if G.are_connected(y1, y2):
                    y1_neighbors = G.neighbors(y1)
                    y2_neighbors = G.neighbors(y2)

                    exclude = x_neighbors + y1_neighbors + y2_neighbors

                    while not have_rewire_candidates:

                        # pick a random edge
                        eidx = gen.randint(0, n_edges)
                        e = G.es[eidx]
                        z2, z1 = e.source, e.target

                        # neither z1 nor z2 can be connected to either x, y1,
                        # or y2
                        if z1 not in exclude and z2 not in exclude:

                            have_rewire_candidates = True
                            break

                if have_rewire_candidates:
                    break

    # perform double edge swap
    G.delete_edges([(y1, y2), (z2, z1)])
    G.add_edges([(y1, z1), (z2, y2)])

    pass


def _filter_by_degree(indices, degrees, min_deg=2):
    return [ii for ii, dd in izip(indices, degrees) if dd >= min_deg]


def _any_direction(G, a, b):
    """
    if a --> b, return 1
    if a <-- b, return -1
    else,       return 0
    """
    if G.are_connected(a, b):
        return 1
    elif G.are_connected(b, a):
        return -1
    else:
        return 0


def naive_diffusion(G, target_gcc, tol=1E-3, maxiter=None, inplace=True,
                    seed=None, verbose=False):
    r"""
    Naive approach to obtaining a shuffled version of the input graph G with a
    prescribed target global clustering coefficient.

    On each iteration, a random subset of the edges is rewired whilst
    preserving the degree sequence of the graph. If the global clustering
    coefficient of the rewired graph is closer to the target value, the rewired
    graph is kept into the next iteration. The algorithm halts either when the
    GCC is sufficiently close to the target, or when a maximum number of
    iterations is reached.

    Arguments:
    ----------
        G: igraph.Graph
            input graph to be shuffled; can be either undirected or directed
        target_gcc: float
            target global clustering coefficient
        tol: float
            halting tolerance: abs(gcc - target_gcc) < tol
        maxiter: int
            maximum number of rewiring steps to perform (default is no limit)
        inplace: bool
            modify G in place
        seed: int
            seed for the random number generator, see random.Random
        verbose: bool
            print convergence messages

    Returns:
    --------
        G_shuf: igraph.Graph
            shuffled graph, has the same same degree sequence(s) as G
        niter: int
            total rewiring iterations performed
        gcc_best: float
            final global clustering coefficient

    """

    if maxiter is None:
        maxiter = np.inf

    if not inplace:
        G = G.copy()

    # use a Python random.Random instance as the RNG for igraph (probably
    # slightly slower, but allows seeding for repeatability)
    igraph.set_random_number_generator(Random(seed))

    loss_best = 1.
    n_edges = G.ecount()

    # start with a high rewiring rate
    rewiring_rate = n_edges

    n_iter = 0
    done = False

    while not done:

        # operate on a copy of the current best graph
        G_prime = G.copy()

        # adjust the number of connections to rewire according to the current
        # best loss
        n_rewire = min(max(int(rewiring_rate * loss_best), 1), n_edges)
        G_prime.rewire(n=n_rewire)

        # compute the global clustering coefficient
        gcc = G_prime.transitivity_undirected()
        loss = abs(gcc - target_gcc)

        # did we improve?
        improved = loss < loss_best

        if improved:

            # keep the new graph
            G = G_prime
            loss_best = loss
            gcc_best = gcc

            # increase the rewiring rate
            rewiring_rate *= 1.1

        else:

            # reduce the rewiring rate
            rewiring_rate *= 0.9

        if verbose:
            print("iter %4i: n_rewired=%5i, new GCC=%-8.3g, loss=%-8.3g, "
                  "improved=%5s, rewiring rate=%-8.3g"
                  % (n_iter, n_rewire, gcc, loss, improved, rewiring_rate))

        n_iter += 1

        # check termination conditions
        if loss_best <= tol:
            done = True
        elif n_iter > maxiter:
            print('failed to converge within maxiter (%i)' % maxiter)
            done = True

    return G, n_iter, gcc_best