指标值与广告投放的关系(数据范围A行和E行)

时间:2018-08-21 03:33:28

标签: pandas

我想知道每分钟飞行场总和的散点图。我的信息如下 http://python2018.byethost10.com/flights.csv
我的语法如下

import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['Noto Serif CJK TC']
matplotlib.rcParams['font.family']='sans-serif'
Df=pd.read_csv('flights.csv')
Df["time_hour"] = pd.to_datetime(df['time_hour'])
grp = df.groupby(by=[df.time_hour.map(lambda x : (x.hour, x.minute))])
a=grp.sum()
plt.scatter(a.index, a['flight'], c='b', marker='o')
plt.xlabel('index value', fontsize=16)
plt.ylabel('flight', fontsize=16)
plt.title('scatter plot - index value vs. flight (data range A row & E row )', fontsize=20)
plt.show()

产生以下错误:

  

产生以下错误   追溯(最近一次通话):
    
中的文件“ I:/PycharmProjects/1223/raise1/char3.py”,第10行       Plt.scatter(a.index,a ['flight'],c ='b',marker ='o')
    散布在文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ matplotlib \ pyplot.py”中,行3470       Edgecolors = edgecolors,data = data,** kwargs)
    内部文件1855行中的文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ matplotlib__init __。py”       返回func(ax,* args,** kwargs)
    文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ matplotlib \ axes_axes.py”,第4320行,分散显示       Alpha = alpha
     init 中的文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ matplotlib \ collections.py”,第927行       收集。初始化(自己,**假人)
     init 中的文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ matplotlib \ collections.py”,第159行       偏移量= np.asanyarray(偏移量,浮点数)
    文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ numpy \ core \ numeric.py”,行544,以asanyarray格式显示       返回数组(a,dtype,copy = False,order = order,subok = True)
  ValueError:设置具有序列的数组元素。

如何产生以下结果?谢谢。 http://python2018.byethost10.com/image.png

1 个答案:

答案 0 :(得分:2)

问题在汇总中,在您的代码中它返回索引中的元组。

解决方案将Series.dt.strftimetime_dt列转换为字符串HH:MM

a = df.groupby(by=[df.time_hour.dt.strftime('%H:%M')]).sum()

一起:

import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['Noto Serif CJK TC']
matplotlib.rcParams['font.family']='sans-serif'

#first column is index and second clumn is parsed to datetimes
df=pd.read_csv('flights.csv', index_col=[0], parse_dates=[1])
a = df.groupby(by=[df.time_hour.dt.strftime('%H:%M')]).sum()
print (a)
             year  sched_dep_time  flight  air_time  distance  hour  minute
time_hour                                                                  
05:00      122793           37856   87445   11282.0     72838   366    1256
05:01      120780           44810   82113   11115.0     71168   435    1310
05:02      122793           52989   99975   11165.0     72068   515    1489
05:03      120780           57653   98323   10366.0     65137   561    1553
05:04      122793           67706  110230   10026.0     63118   661    1606
05:05      122793           75807  126426    9161.0     55371   742    1607
05:06      120780           82010  120753   10804.0     67827   799    2110
05:07      122793           90684  130339    8408.0     52945   890    1684
05:08      120780           93687  114415   10299.0     63271   922    1487
05:09      122793          101571   99526   11525.0     72915  1002    1371
05:10      122793          107252  107961   10383.0     70137  1056    1652
05:11      120780          111351  120261   10949.0     73350  1098    1551
05:12      122793          120575  135930    8661.0     57406  1190    1575
05:13      120780          118272  104763    7784.0     55886  1166    1672
05:14      122793           37289  109300    9838.0     63582   364     889
05:15      122793           42374   67193   11480.0     78183   409    1474
05:16       58377           22321   53424    4271.0     27527   216     721

plt.scatter(a.index, a['flight'], c='b', marker='o')
#rotate labels of x axis
plt.xticks(rotation=90)
plt.xlabel('index value', fontsize=16)
plt.ylabel('flight', fontsize=16)
plt.title('scatter plot - index value vs. flight (data range A row & E row )', fontsize=20)
plt.show()

graph

另一种解决方案是将日期时间转换为时间:

import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
matplotlib.rcParams['font.sans-serif'] = 'Noto Serif CJK TC'
matplotlib.rcParams['font.family']='sans-serif'
df=pd.read_csv('flights.csv', index_col=[0], parse_dates=[1])
a = df.groupby(by=[df.time_hour.dt.time]).sum()
print (a)
             year  sched_dep_time  flight  air_time  distance  hour  minute
time_hour                                                                  
05:00:00   122793           37856   87445   11282.0     72838   366    1256
05:01:00   120780           44810   82113   11115.0     71168   435    1310
05:02:00   122793           52989   99975   11165.0     72068   515    1489
05:03:00   120780           57653   98323   10366.0     65137   561    1553
05:04:00   122793           67706  110230   10026.0     63118   661    1606
05:05:00   122793           75807  126426    9161.0     55371   742    1607
05:06:00   120780           82010  120753   10804.0     67827   799    2110
05:07:00   122793           90684  130339    8408.0     52945   890    1684
05:08:00   120780           93687  114415   10299.0     63271   922    1487
05:09:00   122793          101571   99526   11525.0     72915  1002    1371
05:10:00   122793          107252  107961   10383.0     70137  1056    1652
05:11:00   120780          111351  120261   10949.0     73350  1098    1551
05:12:00   122793          120575  135930    8661.0     57406  1190    1575
05:13:00   120780          118272  104763    7784.0     55886  1166    1672
05:14:00   122793           37289  109300    9838.0     63582   364     889
05:15:00   122793           42374   67193   11480.0     78183   409    1474
05:16:00    58377           22321   53424    4271.0     27527   216     721

plt.scatter(a.index, a['flight'], c='b', marker='o')
plt.xticks(rotation=90)
plt.xlabel('index value', fontsize=16)
plt.ylabel('flight', fontsize=16)
plt.title('scatter plot - index value vs. flight (data range A row & E row )', fontsize=20)
plt.show()

graph2