如何加速Python Basemap等值区动画

时间:2017-01-11 00:27:56

标签: python animation matplotlib matplotlib-basemap choropleth

various sources中获取想法,并结合我自己的想法,我试图创建一个动画地图,根据我的数据中的某些值显示国家/地区的阴影。


  1. 运行数据库查询以获取按国家/地区和时间键入的数据集
  2. 使用pandas进行一些数据操作(总和,平均等)
  3. 初始化底图对象,然后加载Load external shapefile
  4. 使用animation library为国家/地区着色,为数据集中的每个不同“时间”设置一帧。
  5. 另存为gif或mp4或其他
  6. 这很好用。问题是它非常慢。我有可能超过100k的时间间隔(超过几个指标)我想要制作动画,并且我得到每帧的平均时间为15秒,并且它越多,帧越多。按照这个速度,我的计算机上的cpu和内存可能需要数周才能生成一个动画。

    我知道matplotlib的速度并不快(例如:12)但是我读到人们以5 + fps生成动画的故事,并想知道我做错了什么


    1. 仅重新着色动画功能中的国家/地区。这平均每帧约3s,所以虽然可以改进,但并不需要花费最多的时间。
    2. 我使用blit选项。
    3. 我尝试使用较小的绘图尺寸和不太详细的底图,但结果很少。
    4. 也许一个不太详细的shapefile会加速形状的着色,但正如我之前所说,每帧只有3s的改进。


      import pandas as pd
      import numpy as np
      import matplotlib as mpl
      import matplotlib.pyplot as plt
      import matplotlib.animation as animation
      import time
      from math import pi
      from sqlalchemy import create_engine
      from mpl_toolkits.basemap import Basemap
      from matplotlib.patches import Polygon
      from matplotlib.collections import PatchCollection
      from geonamescache import GeonamesCache
      from datetime import datetime
      def get_dataset(avg_interval, startTime, endTime):
          ### SQL query
          # Returns a dataframe with fields [country, unixtime, metric1, metric2, metric3, metric4, metric5]]
          # I use unixtime so I can group by any arbitrary interval to get sums and avgs of the metrics (hence the param avg_interval)
          return df
      # Initialize plot figure
      fig=plt.figure(figsize=(11, 6))
      ax = fig.add_subplot(111, axisbg='w', frame_on=False)
      # Initialize map with Robinson projection
      m = Basemap(projection='robin', lon_0=0, resolution='c')
      # Load and read shapefile
      shapefile = 'countries/ne_10m_admin_0_countries'
      m.readshapefile(shapefile, 'units', color='#dddddd', linewidth=0.005)
      # Get valid country code list
      gc = GeonamesCache()
      iso2_codes = list(gc.get_dataset_by_key(gc.get_countries(), 'fips').keys())
      # Get dataset and remove invalid countries
      # This one will get daily aggregates for the first week of the year
      df = get_dataset(60*60*24, '2016-01-01', '2016-01-08')
      df.set_index(["country"], inplace=True)
      df = df.ix[iso2_codes].dropna()
      num_colors = 20
      # Get list of distinct times to iterate over in the animation
      period = df["unixtime"].sort_values(ascending=True).unique()
      # Assign bins to each value in the df
      values = df["metric1"]
      cm = plt.get_cmap('afmhot_r')
      scheme= cm(1.*np.arange(num_colors)/num_colors)
      bins = np.linspace(values.min(), values.max(), num_colors)
      df["bin"] = np.digitize(values, bins) - 1
      # Initialize animation return object
      x,y = m([],[])
      point = m.plot(x, y,)[0]
      # Pre-zip country details and shap objects
      zipped = zip(m.units_info, m.units)
      tbegin = time.time()
      # Animate! This is the part that takes a long time. Most of the time taken seems to happen between frames...
      def animate(i):
          # Clear the axis object so it doesn't draw over the old one
          # Dynamic title
          fig.suptitle('Num: {}'.format(datetime.utcfromtimestamp(int(i)).strftime('%Y-%m-%d %H:%M:%S')), fontsize=30, y=.95)
          tstart = time.time()
          # Get current frame dataset
          frame = df[df["unixtime"]==i]
          # Loop through every country
          for info, shape in zipped:
              iso2 = info['ISO_A2']
              if iso2 not in frame.index:
                  # Gray if not in dataset
                  color = '#dddddd'
                  # Colored if in dataset
                  color = scheme[int(frame.ix[iso2]["bin"])]
              # Get shape info for country, then color on the ax subplot
              patches = [Polygon(np.array(shape), True)]
              pc = PatchCollection(patches)
          tend = time.time()
          #print "{}%: {} of {} took {}s".format(str(ind/tot*100), str(ind), str(tot), str(tend-tstart))
          print "{}: {}s".format(datetime.utcfromtimestamp(int(i)).strftime('%Y-%m-%d %H:%M:%S'), str(tend-tstart))
          return None
      # Initialize animation object
      output = animation.FuncAnimation(fig, animate, period, interval=150, repeat=False, blit=False)
      filestring = time.strftime("%Y%m%d%H%M%S")
      # Save animation object as m,p4
      #output.save(filestring + '.mp4', fps=1, codec='ffmpeg', extra_args=['-vcodec', 'libx264'])
      # Save animation object as gif
      output.save(filestring + '.gif', writer='imagemagick')
      tfinish = time.time()
      print "Total time: {}s".format(str(tfinish-tbegin))
      print "{}s per frame".format(str((tfinish-tbegin)/len(df["unixtime"].unique())))



      2016-01-01 00:00:00: 3.87843298912s
      2016-01-01 00:00:00: 4.08691620827s
      2016-01-02 00:00:00: 3.40868711472s
      2016-01-03 00:00:00: 4.21187019348s
      Total time: 29.0233821869s
      9.67446072896s per frame


      编辑2:我运行了一些性能测试并确定生成每个附加帧的平均时间大于最后一帧,与帧数成比例,表明这是一个二次时间过程。 (或者它是指数吗?)无论哪种方式,我都很困惑,为什么这不是线性的。如果数据集已经生成,并且地图需要一个恒定的时间来重新生成,那么哪个变量导致每个额外的帧花费的时间比前一个更长?

      编辑3:我刚认识到我不知道动画功能是如何工作的。 (x,y)和点变量取自刚绘制移动点的示例,因此在该上下文中有意义。地图......不是那么多。我尝试返回与animate函数相关的地图,并获得更好的性能。返回ax对象(return ax,)会使过程以线性时间运行...但不会向gif写入任何内容。任何人都知道我需要从animate函数返回什么才能使它工作?


