如何避免Matplotlib散点图中的标签重叠? (以自动方式)

时间:2019-04-16 14:49:19

标签: python matplotlib label scatter-plot overlap

我必须绘制几个相关图。在每种情况下,我都需要标记每个点(目前为8,但是这个数字很容易增加到几十个)。由于有时这些点彼此之间非常接近,所以重叠标签存在问题。不幸的是,这些点在每个图中的分布不同,因此我无法为每个图固定标签,因为对于所有图,我都需要这样做。有没有自动的方法来避免重叠,例如通过在标签每次与另一个标签重叠时将标签推得更高一点?

除了在Google和StackOverlfow上进行了广泛的研究(我找不到能自动解决此问题的适当方法)之外,我还尝试了一些方法。 我最成功的方法是以numpy数组的形式覆盖网格,然后在该位置被占用的情况下用1填充位置。这工作得很好,但是存在问题。为了正确地将点分配到网格中的某个位置,我将在x和y方向上的值四舍五入。然后,我尝试在网格中找到最接近该点实际位置的零。之后,我计算该标签的偏移量。现在的问题是,偏移量是从舍入位置开始的,而不是实际位置。这导致点和标签重叠。而且,如果我尝试计算与原始位置的偏移量,那么网格的整个概念将不再起作用,并且标签会再次重叠。

def plotlinreg(xfile: pd.DataFrame, yfile: pd.DataFrame, xcolumn: str, ycolumn: str, savestring: str, label_column: str):
    xfile_copy = xfile.copy()
    yfile_copy = yfile.copy()

    x = xfile[xcolumn].values
    y = yfile[ycolumn].values

    b, m = polyfit(x, y, 1)

    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(x, y, '.')
    ax.plot(x, b + m * x, '-')

######################################################################################
######### The following commands are used to make sure that no labels overlap ########
######################################################################################

    labels = list(xfile[label_column])
    no_x = 6 # Number of labels in x direction that fit next to each other
    no_y = 8 # Same for y
    x_steps = (ax.get_xlim()[1] - ax.get_xlim()[0])/no_x # Calculates the step size according to the limits and the maximal number of labels
    y_steps = (ax.get_ylim()[1] - ax.get_ylim()[0])/no_y # Same for y

    label_grid = np.zeros((no_y,no_x))

    xfile_copy[xcolumn] = xfile_copy[xcolumn].apply(lambda x: int(math.floor((x - ax.get_xlim()[0])/x_steps)))
    yfile_copy[ycolumn] = yfile_copy[ycolumn].apply(lambda y: int(math.floor((y - ax.get_ylim()[0])/y_steps)))
    # This calculates the positions of the values by substracting the minimum value, dividing it by the step size and then rounding down.

    for x_position, y_position in zip(xfile_copy[xcolumn], yfile_copy[ycolumn]):
        # Blocks position in the grid where the data points are to avoid an overlap between those and the labels
        label_grid[no_y-1 - y_position, x_position] = 1

    for label, x, y, x_position, y_position in zip(labels, xfile[xcolumn], yfile[ycolumn], xfile_copy[xcolumn], yfile_copy[ycolumn]):
        delta = 0
        mdelta = 1
        condition = True
        positive = True

        while condition:
            while positive:
            # First, try to find a new position by looking for an empty space in a column to the right
                if x_position+delta == no_x-1:
                    break
                if 0 in label_grid[:, x_position+delta]: # Is there an empty position in the current column?
                    itemindex = np.where(label_grid[:, x_position+delta]==0) # Where are the zeros?
                    x_index = find_nearest(itemindex[0], y_position) # What is the closest one?
                    offset = (ax.get_xlim()[0] + (x_position+delta)*x_steps, ax.get_ylim()[0] + (x_index)*y_steps)
                    # Setting the offset for this label
                    label_grid[x_index, x_position + delta] = 1 # Set this position to 1
                    positive = False
                delta =+1
            if 0 in label_grid[:, x_position-delta]:
            # Same thing, but now going to the left
                itemindex = np.where(label_grid[:, x_position-delta]==0)
                x_index = find_nearest(itemindex[0], y_position)
                offset = (ax.get_xlim()[0] + (x_position-delta)*x_steps, ax.get_ylim()[0] + (x_index)*y_steps)
                label_grid[x_index, x_position - delta] = 1
                condition = False
            mdelta +=1

        ax.annotate(
            label,
            xy=(x, y), xytext=offset, fontsize=9,
            ha='left', va='top',
            bbox=dict(boxstyle='round,pad=0.2', fc='red', alpha=1),
            arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0', alpha=0.2))

################################
######### End of labels ########
################################

    ax.set_ylim(ax.get_ylim()[0] - y_steps, ax.get_ylim()[1] + 0.5*y_steps)
    ax.set_xlim(ax.get_xlim()[0] - 1.5*x_steps, ax.get_xlim()[1] + 0.3*x_steps)
    ax.set_xlabel(xcolumn)
    ax.set_ylabel(ycolumn)
    ax.set_title("Testrun")

    return fig

As can be seen here, the label for value 3 overlaps with the point

0 个答案:

没有答案