Question

我有一个这样的熊猫数据框；

X   Y  VALUE
140 45 124
15 129 219
189 90 125

我正在尝试通过将每个“网格”中的所有value相加来从此数据帧创建2D彩色网格。现在我正在通过

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

step = 5
xx = np.arange(0, 200+step, step)
yy = np.arange(0, 200+step, step)
array = np.empty(xx.size * yy.size)
ctr = 0
for y in np.nditer(yy):
    for x in np.nditer(xx):
        grid = df[(df['X'] >= x) & (df['X'] < x + step) \
                & (df['Y'] >= y) & (df['Y'] < y + step)]
        value_sum = grid['VALUE'].sum() if not grid.empty else 0
        array[ctr] = value_sum
        ctr += 1
mesh = array.reshape((yy.size, xx.size))
plt.pcolormesh(xx, yy, mesh)

这确实达到了我的目的，但是由于我使用的是Python for循环，因此速度非常慢。有没有一种方法可以使用numpy广播功能来避免使用python for循环并创建相同的网格物体（ndarray）？

Answer 1

您可能希望使用值作为权重来研究np.histogram2d：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import random

xs = [random.randrange(0,201,1) for _ in range(100)]
ys = [random.randrange(0,201,1) for _ in range(100)]
value = [random.randrange(0,500,1) for _ in range(100)]

hist, xedges, yedges = np.histogram2d(xs, ys, bins=42, range=None, normed=None, weights=value, density=None)
plt.pcolormesh(xedges, yedges, hist)

这比计算机上的循环版本快10倍。

Answer 2

像这样简单的事情怎么样？

mesh = np.zeros((len(xx), len(yy)))
for row in df.itertuples():
    mesh[row.Y//step, row.X//step] = row.VALUE
plt.pcolormesh(yy, xx, mesh)

有没有一种有效的方法可以根据熊猫的散布数据创建色目？

2 个答案: