Python中的R函数geom_freqpoly等效于绘制频率多边形

时间:2019-04-20 20:13:23

标签: python r pandas matplotlib seaborn

如何在Python中绘制频率多边形?

例如,我可以这样绘制密度图:

import pandas as pd

x = (1.5,1.5,1.5,1.5,1.5,1.5,1.5,
         2.5,2.5,2.5,
         3.5,3.5,3.5,3.5,3.5,3.5,
         4.5,4.5,
         6.5,6.5,6.5,6.5,6.5,6.5,6.5,6.5)

df = pd.DataFrame({'x': x})
#df.head()

df.plot(kind='density')

这给出了:

enter image description here

但是,我想要这样的多边形:

library(ggplot2)

x = c(1.5,1.5,1.5,1.5,1.5,1.5,1.5,
         2.5,2.5,2.5,
         3.5,3.5,3.5,3.5,3.5,3.5,
         4.5,4.5,
         6.5,6.5,6.5,6.5,6.5,6.5,6.5,6.5)

df = data.frame(x=x)
# head(x)

ggplot(data=df, mapping = aes(x=x)) + 
  geom_freqpoly(binwidth=2)

enter image description here

更新
我尝试了@Quan Hoang的解决方案来解决Hadley在“ R for Data Science”一书中给出的问题,并得到了类似的结果。

图书:

enter image description here enter image description here

我保存了从R获得的nycflights13数据,并将其放在github中。

这是我尝试获得相同剧情的尝试:

import numpy as np
import pandas as pd
import seaborn as sns

flights = pd.read_csv('https://github.com/bhishanpdl/Datasets/blob/master/nycflights13.csv?raw=true')

not_cancelled = flights.dropna(subset=['dep_delay','arr_delay'])
not_cancelled.dep_delay.isnull().sum(), not_cancelled.arr_delay.isnull().sum()

delays = not_cancelled.groupby('tailnum')['arr_delay'].mean().reset_index()


x = delays.arr_delay.values
m = int(x.max())
counts, bins = np.histogram(x, bins=range(-80,m,10))
plt.plot(bins[:-1]+1, counts)

enter image description here

1 个答案:

答案 0 :(得分:2)

我能够复制R图

counts, bins = np.histogram(df.x, bins=range(-1,10,2))
plt.plot(bins[:-1]+1, counts)

输出:

enter image description here

但是,如果您不确定要查找的内容,则很难确定在一般情况下要修改的内容/方法。