pandas.io.ga不适合我

时间:2014-11-11 05:22:14

标签: pandas google-analytics google-analytics-api

所以我已经通过Hello Analytics教程确认OAuth2正在按预期工作,但我对pandas.io.ga模块没有任何好运。特别是,我遇到了这个错误:

In [1]: from pandas.io import ga

In [2]: df = ga.read_ga("pageviews", "pagePath", "2014-07-08")
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1162: FutureWarning: using '-' to provide set differences 
with Indexes is deprecated, use .difference()
"use .difference()",FutureWarning)
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1147: FutureWarning: using '+' to provide set union with 
Indexes is deprecated, use '|' or .union()
"use '|' or .union()",FutureWarning)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-b5343faf9ae6> in <module>()
----> 1 df = ga.read_ga("pageviews", "pagePath", "2014-07-08")

/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in read_ga(metrics, dimensions, start_date, **kwargs)
    105     reader = GAnalytics(**reader_kwds)
    106     return reader.get_data(metrics=metrics, start_date=start_date,
--> 107                            dimensions=dimensions, **kwargs)
    108 
    109 

/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in get_data(self, metrics, start_date, end_date, dimensions, 
segment, filters, start_index, max_results, index_col, parse_dates, keep_date_col, date_parser, na_values, converters, 
sort, dayfirst, account_name, account_id, property_name, property_id, profile_name, profile_id, chunksize)
    293 
    294         if chunksize is None:
--> 295             return _read(start_index, max_results)
    296 
    297         def iterator():

/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _read(start, result_size)
    287                                         dayfirst=dayfirst,
    288                                         na_values=na_values,
--> 289                                         converters=converters, sort=sort)
    290             except HttpError as inst:
    291                 raise ValueError('Google API error %s: %s' % (inst.resp.status,

/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _parse_data(self, rows, col_info, index_col, parse_dates, 
keep_date_col, date_parser, dayfirst, na_values, converters, sort)
    313                                   keep_date_col=keep_date_col,
    314                                   converters=converters,
--> 315                                   header=None, names=col_names))
    316 
    317         if isinstance(sort, bool) and sort:

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    237 
    238     # Create the parser.
--> 239     parser = TextFileReader(filepath_or_buffer, **kwds)
    240 
    241     if (nrows is not None) and (chunksize is not None):

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    551             self.options['has_index_names'] = kwds['has_index_names']
    552 
--> 553         self._make_engine(self.engine)
    554 
    555     def _get_options_with_defaults(self, engine):

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    694             elif engine == 'python-fwf':
    695                 klass = FixedWidthFieldParser
--> 696             self._engine = klass(self.f, **self.options)
    697 
    698     def _failover_to_python(self):

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, **kwds)
   1412         if not self._has_complex_date_col:
   1413             (index_names,
-> 1414              self.orig_names, self.columns) = self._get_index_name(self.columns)
   1415             self._name_processed = True
   1416             if self.index_names is None:

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _get_index_name(self, columns)
   1886             # Case 2
   1887             (index_name, columns_,
-> 1888              self.index_col) = _clean_index_names(columns, self.index_col)
   1889 
   1890         return index_name, orig_names, columns

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _clean_index_names(columns, index_col)
   2171                     break
   2172         else:
-> 2173             name = cp_cols[c]
   2174             columns.remove(name)
   2175             index_names.append(name)

TypeError: list indices must be integers, not Index

OAuth2按预期工作,我只使用这些参数作为演示变量 - 查询本身就是垃圾。基本上,我无法弄清楚错误的来源,并会欣赏任何可能有的指针。

谢谢!

解决方案(SORT OF)

不确定这是否与我试图访问的数据有什么关系,但是我从pandas.io.ga.GDataReader中的index_col变量中得到了令人讨厌的索引类型错误.get_data()的类型为pandas.core.index.Index。这被送到_parse_data()中的pandas.io.parsers._read()中。我不明白这一点,但这对我来说是一个突破点。

作为修复 - 如果其他人遇到此问题 - 我已将ga.py第270行编辑为:

index_col = _clean_index(list(dimensions), parse_dates).tolist()
现在一切都很顺利,但我怀疑这可能会在其他情况下破坏......

1 个答案:

答案 0 :(得分:1)

不幸的是,这个模块并没有真正记录,错误并不总是有意义的。包括您的account_nameproperty_nameprofile_nameprofile_name是在线版本中的View。然后添加您感兴趣的一些dimensionsmetrics。还要确保client_secrets.json位于pandas.io目录中。一个例子:

ga.read_ga(account_name=account_name,
           property_name=property_name,
           profile_name=profile_name,
           dimensions=['date', 'hour', 'minute'],
           metrics=['pageviews'],
           start_date=start_date,
           end_date=end_date,
           index_col=0,
           parse_dates={'datetime': ['date', 'hour', 'minute']},
           date_parser=lambda x: datetime.strptime(x, '%Y%m%d %H %M'),
           max_results=max_results)

另请查看我最近一步一步blog post关于使用pandas的GA。