这是我的数据集:
dropoff_latitude dropoff_longitude
(40.6, 40.65] (-74.03, -73.98] 1364
(-73.98, -73.93] 2123
(-73.93, -73.88] 368
(-73.88, -73.83] 20
(-73.83, -73.78] 9564
(40.65, 40.7] (-74.03, -73.98] 18629
(-73.98, -73.93] 22453
(-73.93, -73.88] 4343
(-73.88, -73.83] 1027
(-73.83, -73.78] 2170
(40.7, 40.75] (-74.03, -73.98] 443893
(-73.98, -73.93] 84331
(-73.93, -73.88] 9658
(-73.88, -73.83] 4700
(-73.83, -73.78] 1756
(40.75, 40.8] (-74.03, -73.98] 249840
(-73.98, -73.93] 486286
(-73.93, -73.88] 15424
(-73.88, -73.83] 18957
(-73.83, -73.78] 911
(40.8, 40.85] (-74.03, -73.98] 34
(-73.98, -73.93] 49718
(-73.93, -73.88] 4283
(-73.88, -73.83] 1070
(-73.83, -73.78] 218
(40.85, 40.9] (-74.03, -73.98] 52
(-73.98, -73.93] 2295
(-73.93, -73.88] 4427
(-73.88, -73.83] 1020
(-73.83, -73.78] 132
所以,数据可视化绝对不是我的强项。我正在努力想出一个正确绘制这个的方法。只是让你了解我正在尝试的内容,我想要一个网格断开,如上表所示,并且网格中的每个部分都要加上阴影以对应特定的音量。
我尝试过使用seaborn的热图方法,但没有运气。我需要重新格式化我的数据吗?
答案 0 :(得分:2)
如果您分别使用纬度和经度作为数据框索引和列名称,您可能会发现它更容易。
import numpy as np
import pandas as pd
import seaborn as sns
# sample data
dropoff_latitude = ["(40.6, 40.65]", "(40.65, 40.7]", "(40.7, 40.75]",
"(40.75, 40.8]", "(40.8, 40.85]", "(40.85, 40.9]"]
dropoff_longitude = ["(-74.03, -73.98]", "(-73.98, -73.93]", "(-73.93, -73.88]",
"(-73.88, -73.83]", "(-73.83, -73.78]"]
values = np.array([1364, 2123, 368, 20, 9564, 18629, 22453,
4343, 1027, 2170, 443893, 84331, 9658, 4700,
1756, 249840, 486286, 15424, 18957, 911, 34,
49718, 4283, 1070, 218, 53, 2295, 4427, 1020, 132])
values = values.reshape(6,5)
df = pd.DataFrame(values, index=dropoff_latitude, columns=dropoff_longitude)
print(df)
(-74.03, -73.98] (-73.98, -73.93] (-73.93, -73.88] \
(40.6, 40.65] 1364 2123 368
(40.65, 40.7] 18629 22453 4343
(40.7, 40.75] 443893 84331 9658
(40.75, 40.8] 249840 486286 15424
(40.8, 40.85] 34 49718 4283
(40.85, 40.9] 53 2295 4427
(-73.88, -73.83] (-73.83, -73.78]
(40.6, 40.65] 20 9564
(40.65, 40.7] 1027 2170
(40.7, 40.75] 4700 1756
(40.75, 40.8] 18957 911
(40.8, 40.85] 1070 218
(40.85, 40.9] 1020 132
现在你可以使用Seaborn的heatmap()
:
sns.heatmap(df)
更新(根据评论):
从你目前的组织方式转到我推荐的方式是可能的。首先,我们将使用上面定义的变量复制您提供的示例多索引数据框:
lat_lon = [(lat, lon) for lat in dropoff_latitude for lon in dropoff_longitude]
lat, lon = zip(*lat_lon)
data = {'dropoff_latitude':lat,
'dropoff_longitude':lon,
'values':values}
df2 = pd.DataFrame(data).set_index(['dropoff_latitude','dropoff_longitude'])
df2
现在与OP数据框相同:
values
dropoff_latitude dropoff_longitude
(40.6, 40.65] (-74.03, -73.98] 1364
(-73.98, -73.93] 2123
(-73.93, -73.88] 368
(-73.88, -73.83] 20
(-73.83, -73.78] 9564
(40.65, 40.7] (-74.03, -73.98] 18629
(-73.98, -73.93] 22453
(-73.93, -73.88] 4343
(-73.88, -73.83] 1027
(-73.83, -73.78] 2170
(40.7, 40.75] (-74.03, -73.98] 443893
(-73.98, -73.93] 84331
(-73.93, -73.88] 9658
(-73.88, -73.83] 4700
(-73.83, -73.78] 1756
(40.75, 40.8] (-74.03, -73.98] 249840
(-73.98, -73.93] 486286
(-73.93, -73.88] 15424
(-73.88, -73.83] 18957
(-73.83, -73.78] 911
(40.8, 40.85] (-74.03, -73.98] 34
(-73.98, -73.93] 49718
(-73.93, -73.88] 4283
(-73.88, -73.83] 1070
(-73.83, -73.78] 218
(40.85, 40.9] (-74.03, -73.98] 53
(-73.98, -73.93] 2295
(-73.93, -73.88] 4427
(-73.88, -73.83] 1020
(-73.83, -73.78] 132
接下来,将索引重置为列,并将pivot
经度数据从行条目重置为列名:
# plot_df is now in the same form as df in my original answer.
plot_df = (df2.reset_index()
.pivot(index='dropoff_latitude', columns='dropoff_longitude'))
从这里,sns.heatmap(plot_df)
生成所需的热图 - 与上面所示相同,但现在x轴按小到大的值排序。