使用numba对大型Excel文件进​​行GPU加速

时间:2018-11-21 18:16:39

标签: python for-loop gpu numba

我对使用GPU的经验很少,在网上查看后,我想到了这一点:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes"/>

  <xsl:template match="sounds | sounds//*">
      <xsl:where-populated>
          <xsl:next-match/>
      </xsl:where-populated>
  </xsl:template>

</xsl:stylesheet>

import pandas as pd from haversine import haversine import numpy as np from pandas import ExcelWriter import numba as nb np.set_printoptions(precision=20) path = 'distance.xlsx' df = pd.read_excel(path) df = df.assign(Dist=pd.Series(np.zeros(27055)).values); #df = df.assign(Facility=pd.Series(np.zeros(27055)).values); df = df.assign(Facility=pd.Series(np.zeros((27055,),dtype='float,float')).values); df["Facility_city"] = "" #idx = np.asarray(df.loc[df["lat1"] != '.'].ix[:,0].index) #temp1 = 1e10 #j = 0 idx = np.asarray(df[(df['lat1']!='.') & (df['state']== df['state'][0])].index) @nb.jit(nopython=True) def f(df): temp1 = 1e10 j = 0 for i in range(0, len(df)): if df['state'][i+1] != df['state'][i]: idx = np.asarray(df[(df['lat1']!='.') & (df['state']== df['state'][i+1])].index) #while (df.iloc[idx[j]]['state'] == df.iloc[i]['state']): while (j!=len(idx)): p1 = (df.iloc[idx[j]]['lat1'],df.iloc[idx[j]]['long1']) p2 = (df.iloc[i]['lat2'],df.iloc[i]['long2']) df.Dist.iloc[i] = min(temp1,haversine(p1, p2, miles=True)) if df.Dist.iloc[i] < temp1: #df.Facility.iloc[i] = idx[j] df.Facility.iloc[i] = (p1[0],p1[1]) df.Facility_city.iloc[i] = df.city.iloc[idx[j]] temp1 = df.Dist.iloc[i] j+=1 j = 0 temp1 = 1e10 return df if __name__ == "__main__": df = f(df) writer = ExcelWriter('Results.xlsx') df.to_excel(writer,'Sheet1') writer.save() 运行近3万次。因此,我倾向于使用GPU(通过for loop),我正努力为代码配置相同的代码。

执行此操作时,出现错误。

Floydhub

我该如何规避这个问题?我认为numba是参考文献中最好的框架,但其他任何框架(例如PyCUDA)也可以。

0 个答案:

没有答案