如何在Python中绘制频率多边形?
例如,我可以这样绘制密度图:
import pandas as pd
x = (1.5,1.5,1.5,1.5,1.5,1.5,1.5,
2.5,2.5,2.5,
3.5,3.5,3.5,3.5,3.5,3.5,
4.5,4.5,
6.5,6.5,6.5,6.5,6.5,6.5,6.5,6.5)
df = pd.DataFrame({'x': x})
#df.head()
df.plot(kind='density')
这给出了:
但是,我想要这样的多边形:
library(ggplot2)
x = c(1.5,1.5,1.5,1.5,1.5,1.5,1.5,
2.5,2.5,2.5,
3.5,3.5,3.5,3.5,3.5,3.5,
4.5,4.5,
6.5,6.5,6.5,6.5,6.5,6.5,6.5,6.5)
df = data.frame(x=x)
# head(x)
ggplot(data=df, mapping = aes(x=x)) +
geom_freqpoly(binwidth=2)
更新
我尝试了@Quan Hoang的解决方案来解决Hadley在“ R for Data Science”一书中给出的问题,并得到了类似的结果。
图书:
我保存了从R获得的nycflights13数据,并将其放在github中。
这是我尝试获得相同剧情的尝试:
import numpy as np
import pandas as pd
import seaborn as sns
flights = pd.read_csv('https://github.com/bhishanpdl/Datasets/blob/master/nycflights13.csv?raw=true')
not_cancelled = flights.dropna(subset=['dep_delay','arr_delay'])
not_cancelled.dep_delay.isnull().sum(), not_cancelled.arr_delay.isnull().sum()
delays = not_cancelled.groupby('tailnum')['arr_delay'].mean().reset_index()
x = delays.arr_delay.values
m = int(x.max())
counts, bins = np.histogram(x, bins=range(-80,m,10))
plt.plot(bins[:-1]+1, counts)