尝试绘制已由两个变量分组的数组

时间:2017-08-10 01:36:13

标签: python pandas plot grouping

这是我的数据集:

dropoff_latitude  dropoff_longitude
(40.6, 40.65]     (-74.03, -73.98]       1364
                  (-73.98, -73.93]       2123
                  (-73.93, -73.88]        368
                  (-73.88, -73.83]         20
                  (-73.83, -73.78]       9564
(40.65, 40.7]     (-74.03, -73.98]      18629
                  (-73.98, -73.93]      22453
                  (-73.93, -73.88]       4343
                  (-73.88, -73.83]       1027
                  (-73.83, -73.78]       2170
(40.7, 40.75]     (-74.03, -73.98]     443893
                  (-73.98, -73.93]      84331
                  (-73.93, -73.88]       9658
                  (-73.88, -73.83]       4700
                  (-73.83, -73.78]       1756
(40.75, 40.8]     (-74.03, -73.98]     249840
                  (-73.98, -73.93]     486286
                  (-73.93, -73.88]      15424
                  (-73.88, -73.83]      18957
                  (-73.83, -73.78]        911
(40.8, 40.85]     (-74.03, -73.98]         34
                  (-73.98, -73.93]      49718
                  (-73.93, -73.88]       4283
                  (-73.88, -73.83]       1070
                  (-73.83, -73.78]        218
(40.85, 40.9]     (-74.03, -73.98]         52
                  (-73.98, -73.93]       2295
                  (-73.93, -73.88]       4427
                  (-73.88, -73.83]       1020
                  (-73.83, -73.78]        132

所以,数据可视化绝对不是我的强项。我正在努力想出一个正确绘制这个的方法。只是让你了解我正在尝试的内容,我想要一个网格断开,如上表所示,并且网格中的每个部分都要加上阴影以对应特定的音量。

我尝试过使用seaborn的热图方法,但没有运气。我需要重新格式化我的数据吗?

1 个答案:

答案 0 :(得分:2)

如果您分别使用纬度和经度作为数据框索引和列名称,您可能会发现它更容易。

import numpy as np
import pandas as pd
import seaborn as sns

# sample data
dropoff_latitude  = ["(40.6, 40.65]", "(40.65, 40.7]", "(40.7, 40.75]",
                     "(40.75, 40.8]", "(40.8, 40.85]", "(40.85, 40.9]"]

dropoff_longitude = ["(-74.03, -73.98]", "(-73.98, -73.93]", "(-73.93, -73.88]", 
                     "(-73.88, -73.83]", "(-73.83, -73.78]"]

values = np.array([1364, 2123, 368, 20, 9564, 18629, 22453, 
                   4343, 1027, 2170, 443893, 84331, 9658, 4700, 
                   1756, 249840, 486286, 15424, 18957, 911, 34,
                   49718, 4283, 1070, 218, 53, 2295, 4427, 1020, 132])
values = values.reshape(6,5)

df = pd.DataFrame(values, index=dropoff_latitude, columns=dropoff_longitude)

print(df)
               (-74.03, -73.98]  (-73.98, -73.93]  (-73.93, -73.88]  \
(40.6, 40.65]              1364              2123               368   
(40.65, 40.7]             18629             22453              4343   
(40.7, 40.75]            443893             84331              9658   
(40.75, 40.8]            249840            486286             15424   
(40.8, 40.85]                34             49718              4283   
(40.85, 40.9]                53              2295              4427   

               (-73.88, -73.83]  (-73.83, -73.78]  
(40.6, 40.65]                20              9564  
(40.65, 40.7]              1027              2170  
(40.7, 40.75]              4700              1756  
(40.75, 40.8]             18957               911  
(40.8, 40.85]              1070               218  
(40.85, 40.9]              1020               132  

现在你可以使用Seaborn的heatmap()

sns.heatmap(df)

heatmap

更新(根据评论):

从你目前的组织方式转到我推荐的方式是可能的。首先,我们将使用上面定义的变量复制您提供的示例多索引数据框:

lat_lon = [(lat, lon) for lat in dropoff_latitude for lon in dropoff_longitude]
lat, lon = zip(*lat_lon)

data = {'dropoff_latitude':lat, 
        'dropoff_longitude':lon,
        'values':values}
df2 = pd.DataFrame(data).set_index(['dropoff_latitude','dropoff_longitude'])

df2现在与OP数据框相同:

                                    values
dropoff_latitude dropoff_longitude        
(40.6, 40.65]    (-74.03, -73.98]     1364
                 (-73.98, -73.93]     2123
                 (-73.93, -73.88]      368
                 (-73.88, -73.83]       20
                 (-73.83, -73.78]     9564
(40.65, 40.7]    (-74.03, -73.98]    18629
                 (-73.98, -73.93]    22453
                 (-73.93, -73.88]     4343
                 (-73.88, -73.83]     1027
                 (-73.83, -73.78]     2170
(40.7, 40.75]    (-74.03, -73.98]   443893
                 (-73.98, -73.93]    84331
                 (-73.93, -73.88]     9658
                 (-73.88, -73.83]     4700
                 (-73.83, -73.78]     1756
(40.75, 40.8]    (-74.03, -73.98]   249840
                 (-73.98, -73.93]   486286
                 (-73.93, -73.88]    15424
                 (-73.88, -73.83]    18957
                 (-73.83, -73.78]      911
(40.8, 40.85]    (-74.03, -73.98]       34
                 (-73.98, -73.93]    49718
                 (-73.93, -73.88]     4283
                 (-73.88, -73.83]     1070
                 (-73.83, -73.78]      218
(40.85, 40.9]    (-74.03, -73.98]       53
                 (-73.98, -73.93]     2295
                 (-73.93, -73.88]     4427
                 (-73.88, -73.83]     1020
                 (-73.83, -73.78]      132

接下来,将索引重置为列,并将pivot经度数据从行条目重置为列名:

# plot_df is now in the same form as df in my original answer.
plot_df = (df2.reset_index()
              .pivot(index='dropoff_latitude', columns='dropoff_longitude'))

从这里,sns.heatmap(plot_df)生成所需的热图 - 与上面所示相同,但​​现在x轴按小到大的值排序。