Python:如何跨多个时间间隔创建固定范围的散点图?

时间:2017-05-02 01:22:31

标签: python pandas matplotlib dataframe scatter-plot

我有以下pandas DataFrame:

import pandas as pd
df = pd.read_table(...)
df

>>> df
>>>    interval  location type  y_axis
0        01      1230    X      50
1        01      1609    X      55
2        01      1903    Y      54
3        01      2574    A      58
4        01      3151    A      57
5        01      3198    B      46
6        01      3312    X      50
...                 .....
         02      42      X      31
         02      214     A      23
         02      598     X      28
....

有几个间隔,例如0102等。在每个时间间隔内,数据点位于1到10,000的范围内。在df中,第一个数据点为40,下一个数据点为136,等等。

间隔02的范围也介于1到15,000之间。

我想创建一个散点图,使得每个间隔按比例绘制1到15000的范围。然后第一个点将绘制在1230,下一个绘制在1609,等等。我还想要一条垂直线,显示间隔的位置。散点图的x轴应该间隔1到10,000。每个间隔都是一个"区域",包含从1到10,000的x轴。所以x轴上的坐标是interval1:1到15000,interval2:1到15000,间隔3:1到15000等等(这几乎就像连接在一起的几个单独的散点图。)

如何实现这一目标?如果没有这种间隔的复杂性,如果希望从这个DataFrame创建一个散点图,可以使用:

df.plot(kind='scatter', x = "location", y = "y_axis")

以下是前50行:

d = {"interval" : ["01",                                                                                                                                                                                                              
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01", "01", "01", "01", "01", "01", "01",                                                                                                                                                                                                          
 "01", "01", "01", "01", "01"], "location" : [1230, 1609,                                                                                                                                                                                                      
 1903, 2574, 3151, 3198, 3312, 3659, 3709,                                                                                                                                                                                                      
 3725, 4172, 4542, 4860, 4900, 5068, 5220,                                                                                                                                                                                                      
 5260, 5339, 5442, 5529, 5773, 6128, 6165,                                                                                                                                                                                                      
 6177, 6269, 6275, 6460, 7167, 7361, 7361,                                                                                                                                                                                                      
 8051, 8222, 8305, 8992, 9104, 9439, 9844,                                                                                                                                                                                                      
 10045, 10764, 10787, 11104, 11478, 11508,                                                                                                                                                                                                          
 11684, 12490, 12590, 12794, 12803, 13823,                                                                                                                                                                                                          
 13982], "type" : ["X", "X", "Y", "A", "A",                                                                                                                                                                                                              
     "B", "X", "X", "X", "B", "B", "A", "A", "A", "B", "B", "X",                                                                                                                                                                                                            
     "B", "Y", "X", "X", "Y", "Y", "C", "A", "X", "X", "Z", "Z",                                                                                                                                                                                                            
     "B", "X", "X", "A", "A", "Y", "X", "A", "X", "X", "Z", "Z",                                                                                                                                                                                                            
     "C", "X", "Y", "Y", "Z", "Z", "Z", "Z", "Z"],  "y_axis" : [50, 55, 
    54, 58, 57, 46, 50, 55, 46, 42, 56, 55, 55, 45, 52, 51, 45, 48, 50,
     49, 53, 55, 45, 40, 49, 37, 52, 58, 52, 4, 58, 52, 49, 58, 50, 55, 
    56, 53, 58, 43, 55, 55, 44, 52, 59, 49, 53, 39, 60, 52]}

3 个答案:

答案 0 :(得分:3)

这里的主要挑战似乎是你希望x轴既是分类的(区间0102等)又是公制(值1 - {{1 }})。您正在谈论用共享的y轴绘制几个散点图,就像您在帖子中指出的那样。我建议你使用15000subplots来做到这一点。你可以使用groupby来调整绘图之间的空间,就像我在这个答案中所做的那样。

首先,使用OP中的subplots_adjust()生成一些示例数据。我们还会随机选择一半的观察结果并将其更改为d,以展示所需的镶板:

interval=02

现在使用import pandas as pd import numpy as np df = pd.DataFrame(d) # shuffle rows # (taken from this answer: http://stackoverflow.com/a/15772330/2799941) df = df.reindex(np.random.permutation(df.index)) # randomly select half of the rows for changing to interval 02 interval02 = df.sample(int(df.shape[0]/2.)).index df.loc[interval02, 'interval'] = "02" 指定并排的子图,并删除图之间的任何填充。

pyplot

最后,from matplotlib import pyplot as plt # n_plots = number of different interval values n_plots = len(df.interval.unique()) fig, axes = plt.subplots(1, n_plots, figsize=(10,5), sharey=True) # remove space between plots fig.subplots_adjust(hspace=0, wspace=0) groupby和情节:

interval

side-by-side plot

答案 1 :(得分:2)

您似乎想为每个类别“间隔”绘制不同的散点图 这可以通过按相应列对数据帧进行分组来完成。

Using db = New DbContext() With {.InlineParameters = True}

    Dim query1 = From p In db.pob
                 Where p.date >= New Date(2017, 1, 1)
                 Group p By pu = New With {Key u.User.id, Key u.User.name} Into pg = Group
                 Select New RecentUser With
                 {
                     .id = pu.id,
                     .name = pu.name
                 }

    Return query1.ToList

End Using

enter image description here

答案 2 :(得分:1)

使用Altair,您可以轻松地将两个区间分隔为不同的列/颜色。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

cat = ["01"] *5 + ["02"]*4
x = np.append(np.arange(1,6), np.arange(2.5,4.1,0.5))
y = np.random.randint(12,24, size=len(cat))
df = pd.DataFrame({"cat":cat, "x":x, "y":y})

按列

from altair import *
Chart(df).mark_point().encode(x='x', y='y', column='cat').configure_cell(width=200, height=150)

enter image description here

按颜色

from altair import *
Chart(df).mark_point().encode(x='x', y='y', color='cat').configure_cell(width=200, height=150)

enter image description here