我正在尝试修改pandas数据框,这样我就会有2列。频率列和日期列。

时间:2017-12-09 22:54:00

标签: python pandas dataframe

基本上,我正在使用的是一个数据框,其中包含一年内发出的所有停车票。每个故障单在未更改的数据框中占用自己的行。我想要做的是按日期对所有门票进行分组,以便我有2列(日期和当天发出的门票数量)。现在我可以实现这一点,但是,大熊猫不会将日期视为一个列。

import numpy as np
import matplotlib as mp
import pandas as pd
import matplotlib.pyplot as plt


df1 = pd.read_csv('C:/Users/brett/OneDrive/Data Science 
Fundamentals/Parking_Tags_Data_2012.csv')

unnecessary_cols = ['tag_number_masked', 'infraction_code', 
'infraction_description', 'set_fine_amount', 'time_of_infraction',
                'location1', 'location2', 'location3', 'location4', 
'province']

df1 = df1.drop (unnecessary_cols, 1)


df1 = 
(df1.groupby('date_of_infraction').agg({'date_of_infraction':'count'}))

df1['frequency'] = 
(df1.groupby('date_of_infraction').agg({'date_of_infraction':'count'}))

print (df1)

df1 = (df1.iloc[121:274])

输出结果为:

date_of_infraction     date_of_infraction     frequency 
20120101                          1059        NaN
20120102                          2711        NaN
20120103                          6889        NaN
20120104                          8030        NaN
20120105                          7991        NaN
20120106                          8693        NaN
20120107                          7237        NaN
20120108                          5061        NaN
20120109                          7974        NaN
20120110                          8872        NaN
20120111                          9110        NaN
20120112                          8667        NaN
20120113                          7247        NaN
20120114                          7211        NaN
20120115                          6116        NaN
20120116                          9168        NaN
20120117                          8973        NaN
20120118                          9016        NaN
20120119                          7998        NaN
20120120                          8214        NaN
20120121                          6400        NaN
20120122                          6355        NaN
20120123                          7777        NaN
20120124                          8628        NaN
20120125                          8527        NaN
20120126                          8239        NaN
20120127                          8667        NaN
20120128                          7174        NaN
20120129                          5378        NaN
20120130                          7901        NaN
...                                ...        ...
20121202                          5342        NaN
20121203                          7336        NaN
20121204                          7258        NaN
20121205                          8629        NaN
20121206                          8893        NaN
20121207                          8479        NaN
20121208                          7680        NaN
20121209                          5357        NaN
20121210                          7589        NaN
20121211                          8918        NaN
20121212                          9149        NaN
20121213                          7583        NaN
20121214                          8329        NaN
20121215                          7072        NaN
20121216                          5614        NaN
20121217                          8038        NaN
20121218                          8194        NaN
20121219                          6799        NaN
20121220                          7102        NaN
20121221                          7616        NaN
20121222                          5575        NaN
20121223                          4403        NaN
20121224                          5492        NaN
20121225                           673        NaN
20121226                          1488        NaN
20121227                          4428        NaN
20121228                          5882        NaN
20121229                          3858        NaN
20121230                          3817        NaN
20121231                          4530        NaN

基本上,我想将所有列向右移动一个。现在,pandas只将最后两列视为实际列。我希望这是有道理的。

2 个答案:

答案 0 :(得分:0)

只需拨打一次groupby即可实现每个日期的违规次数。试试这个:

import numpy as np
import pandas as pd

df1 = pd.read_csv('C:/Users/brett/OneDrive/Data Science 
Fundamentals/Parking_Tags_Data_2012.csv')

unnecessary_cols = ['tag_number_masked', 'infraction_code', 
'infraction_description', 'set_fine_amount', 'time_of_infraction',
                'location1', 'location2', 'location3', 'location4', 
'province']

df1 = df1.drop(unnecessary_cols, 1)
# reset_index() to move the dates into their own column
counts = df1.groupby('date_of_infraction').count().reset_index()  
print(counts)

请注意,任何零票证日期都不会显示为0;相反,他们只会缺席counts

如果这不起作用,那么在删除不必要的行之后,查看df1的前几行会很有帮助。

答案 1 :(得分:0)

尝试使用as_index=False

例如:

import numpy as np
import pandas as pd
data = {"date_of_infraction":["20120101", "20120101", "20120202", "20120202"],
        "foo":np.random.random(4)}
df = pd.DataFrame(data)

df
  date_of_infraction       foo
0           20120101  0.681286
1           20120101  0.826723
2           20120202  0.669367
3           20120202  0.766019

(df.groupby("date_of_infraction", as_index=False) # <-- acts like reset_index()
   .foo.count() 
   .rename(columns={"foo":"frequency"})
)
  date_of_infraction  frequency
0           20120101          2
1           20120202          2