尝试创建数据视图,以生成每年每个月的所有出生总数。需要为每一行生成一个熊猫日期。每行包含来自.csv文件中数据的年份值和月份值。在此数据框中添加“天”列,并将天值设置为1。然后使用年,月和日列添加“日期”列。
我的问题是使用分组方式来获取每个YEAR YEAR年每个月的所有出生总数的逻辑。为每一行生成一个大熊猫日期,并让该行包含年值,月值,然后在该行的顶部添加天列。然后需要使用该信息作为“日期”列,然后绘制每年的每月总出生人数。
截至目前,我的绘图没有在其中绘制任何数据,只是具有图形框架
到目前为止,这里是我的代码:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['figure.figsize'] = (8.0, 3.0)
births = pd.read_csv("births.csv")
births = births.drop(births[births.day == 99].index)
births = births.drop(births[births.births < 500].index)
births['day'].fillna(1, inplace=True)
births['day'] = births['day'].astype('int32')
births.dtypes
births['Date'] = pd.to_datetime(births[['year','month','day']])
print(births)
newbirth = births[births['day'] == 31]
start = pd.to_datetime(births['Date'].iloc[0])
end = pd.to_datetime(newbirth['Date'].iloc[-1])
births1 = births[births['gender'] == 'M']
births2 = births[births['gender'] == 'F']
births1.reset_index(inplace = True)
births2.reset_index(inplace = True)
births.reset_index()
#new dataframe
df = pd.DataFrame(births1['Date'])
df['births'] = births1['births'] + births2['births']
print(df)
#newplot
ax1 = plt.subplot(3,1,1)
ax2 = plt.subplot(3,1,3)
start1 = pd.to_datetime("01-01-1984")
end1 = pd.to_datetime("12-31-1986")
start2 = pd.to_datetime("01-01-1984")
end2 = pd.to_datetime("12-31-1986")
adjusted_268 = pd.Timedelta('268 days')
ax1.plot(df['Date'],df['births'])
ax2.plot(df['Date'],df['births'])
ax1.set_xlim(start1, end1)
ax1.set_ylim(8000,12000)
ax1.set_title("Total Daily Births")
ax1.set_ylabel("Daily Births")
ax1.tick_params(axis='x', rotation=45)
ax2.set_xlim(start2, (end2-adjusted_268))
ax2.set_ylim(8000,12000)
ax2.set_title("Likely Conception Dates")
ax2.set_ylabel("Conception Events")
ax2.tick_params(axis='x', rotation=45)
#newdf needed
#newplot needed
new_df = births.groupby('month').sum()
print(new_df)
ax3 = plt.subplot()
ax3.plot(new_df)
ax3.set_ylim(240000,380000)
ax3.set_title("Total Births Per Month")
ax3.set_ylabel("Births")
ax3.tick_params(axis='x', rotation=45)
这是数据框从.csv文件中打印出来的内容,并且在我使用熊猫对其进行更新之后
year month day gender births Date
0 1969 1 1 F 4046 1969-01-01
1 1969 1 1 M 4440 1969-01-01
2 1969 1 2 F 4454 1969-01-02
3 1969 1 2 M 4548 1969-01-02
4 1969 1 3 F 4548 1969-01-03
5 1969 1 3 M 4994 1969-01-03
6 1969 1 4 F 4440 1969-01-04
7 1969 1 4 M 4520 1969-01-04
8 1969 1 5 F 4192 1969-01-05
9 1969 1 5 M 4198 1969-01-05
10 1969 1 6 F 4710 1969-01-06
11 1969 1 6 M 4850 1969-01-06
12 1969 1 7 F 4646 1969-01-07
13 1969 1 7 M 5092 1969-01-07
14 1969 1 8 F 4800 1969-01-08
15 1969 1 8 M 4934 1969-01-08
16 1969 1 9 F 4592 1969-01-09
17 1969 1 9 M 4842 1969-01-09
18 1969 1 10 F 4852 1969-01-10
19 1969 1 10 M 5190 1969-01-10
20 1969 1 11 F 4580 1969-01-11
21 1969 1 11 M 4598 1969-01-11
22 1969 1 12 F 4126 1969-01-12
23 1969 1 12 M 4324 1969-01-12
24 1969 1 13 F 4758 1969-01-13
25 1969 1 13 M 5076 1969-01-13
26 1969 1 14 F 5070 1969-01-14
27 1969 1 14 M 5296 1969-01-14
28 1969 1 15 F 4798 1969-01-15
29 1969 1 15 M 5096 1969-01-15
... ... ... ... ... ... ...
15517 2007 10 1 F 180912 2007-10-01
15518 2007 10 1 M 189157 2007-10-01
15519 2007 11 1 F 173513 2007-11-01
15520 2007 11 1 M 180814 2007-11-01
15521 2007 12 1 F 173787 2007-12-01
15522 2007 12 1 M 181426 2007-12-01
15523 2008 1 1 F 174255 2008-01-01
15524 2008 1 1 M 182789 2008-01-01
15525 2008 2 1 F 165669 2008-02-01
15526 2008 2 1 M 173434 2008-02-01
15527 2008 3 1 F 172053 2008-03-01
15528 2008 3 1 M 179129 2008-03-01
15529 2008 4 1 F 169585 2008-04-01
15530 2008 4 1 M 177399 2008-04-01
15531 2008 5 1 F 173141 2008-05-01
15532 2008 5 1 M 182294 2008-05-01
15533 2008 6 1 F 169958 2008-06-01
15534 2008 6 1 M 179267 2008-06-01
15535 2008 7 1 F 183391 2008-07-01
15536 2008 7 1 M 192714 2008-07-01
15537 2008 8 1 F 182713 2008-08-01
15538 2008 8 1 M 191315 2008-08-01
15539 2008 9 1 F 179696 2008-09-01
15540 2008 9 1 M 188964 2008-09-01
15541 2008 10 1 F 175314 2008-10-01
15542 2008 10 1 M 183219 2008-10-01
15543 2008 11 1 F 158939 2008-11-01
15544 2008 11 1 M 165468 2008-11-01
15545 2008 12 1 F 173215 2008-12-01
15546 2008 12 1 M 181235 2008-12-01
答案 0 :(得分:0)
尝试一下:
生成随机数据:
np.random.seed(2019)
rng = pd.date_range('1969-01-01', '2008-12-01', freq='D')
df = pd.DataFrame({'birth': np.random.randint(1000, size=len(rng))}, index=rng)
打印df
:
birth
1969-01-01 72
1969-01-02 370
1969-01-03 159
1969-01-04 805
1969-01-05 728
1969-01-06 190
1969-01-07 758
1969-01-08 24
1969-01-09 797
... ...
2008-11-02 222
2008-11-03 409
2008-11-04 65
2008-11-05 451
2008-11-06 369
2008-11-07 815
2008-11-08 295
2008-11-09 232
然后创建年,月,日列:
df['month'] = df.index.month
df['day'] = df.index.day
df['year'] = df.index.year
并按年和月分组,找到出生总数:
df_grouped = df.groupby(['year', 'month'])['birth'].sum()
分组df的结果:
year month
1969 1 14212
2 14982
3 15063
4 12726
5 14131
6 16597
7 14487
8 12517
9 14483
10 17620
11 13883
12 17778
1970 1 15345
2 14524
3 18429
4 12981
5 16855
6 17135
7 18174
8 17031
9 14734
10 15743
11 12761
12 18428
1971 1 15063
2 17224
3 15088
4 17015
5 17032
6 17275
...
2006 7 14718
8 13212
9 17389
10 13850
11 13154
12 14126
2007 1 15762
2 12744
3 17176
4 12777
5 16254
6 16297
7 16038
8 15067
9 13724
10 14386
11 15701
12 17136
2008 1 16047
2 15094
3 14539
4 18662
5 16575
6 13553
7 17107
8 15824
9 15096
10 15139
11 15419
12 250
现在让我们绘图:
fig, ax = plt.subplots(figsize=(15,7))
df_grouped.unstack().plot()
plt.show()
如果需要,您可以使情节变得“更漂亮”
[编辑] 我认为条形图看起来会更整洁:
df_grouped.unstack().plot(kind='bar')