所以我已经通过Hello Analytics教程确认OAuth2正在按预期工作,但我对pandas.io.ga模块没有任何好运。特别是,我遇到了这个错误:
In [1]: from pandas.io import ga
In [2]: df = ga.read_ga("pageviews", "pagePath", "2014-07-08")
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1162: FutureWarning: using '-' to provide set differences
with Indexes is deprecated, use .difference()
"use .difference()",FutureWarning)
/usr/local/lib/python2.7/dist-packages/pandas/core/index.py:1147: FutureWarning: using '+' to provide set union with
Indexes is deprecated, use '|' or .union()
"use '|' or .union()",FutureWarning)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-b5343faf9ae6> in <module>()
----> 1 df = ga.read_ga("pageviews", "pagePath", "2014-07-08")
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in read_ga(metrics, dimensions, start_date, **kwargs)
105 reader = GAnalytics(**reader_kwds)
106 return reader.get_data(metrics=metrics, start_date=start_date,
--> 107 dimensions=dimensions, **kwargs)
108
109
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in get_data(self, metrics, start_date, end_date, dimensions,
segment, filters, start_index, max_results, index_col, parse_dates, keep_date_col, date_parser, na_values, converters,
sort, dayfirst, account_name, account_id, property_name, property_id, profile_name, profile_id, chunksize)
293
294 if chunksize is None:
--> 295 return _read(start_index, max_results)
296
297 def iterator():
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _read(start, result_size)
287 dayfirst=dayfirst,
288 na_values=na_values,
--> 289 converters=converters, sort=sort)
290 except HttpError as inst:
291 raise ValueError('Google API error %s: %s' % (inst.resp.status,
/usr/local/lib/python2.7/dist-packages/pandas/io/ga.pyc in _parse_data(self, rows, col_info, index_col, parse_dates,
keep_date_col, date_parser, dayfirst, na_values, converters, sort)
313 keep_date_col=keep_date_col,
314 converters=converters,
--> 315 header=None, names=col_names))
316
317 if isinstance(sort, bool) and sort:
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
237
238 # Create the parser.
--> 239 parser = TextFileReader(filepath_or_buffer, **kwds)
240
241 if (nrows is not None) and (chunksize is not None):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
551 self.options['has_index_names'] = kwds['has_index_names']
552
--> 553 self._make_engine(self.engine)
554
555 def _get_options_with_defaults(self, engine):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
694 elif engine == 'python-fwf':
695 klass = FixedWidthFieldParser
--> 696 self._engine = klass(self.f, **self.options)
697
698 def _failover_to_python(self):
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in __init__(self, f, **kwds)
1412 if not self._has_complex_date_col:
1413 (index_names,
-> 1414 self.orig_names, self.columns) = self._get_index_name(self.columns)
1415 self._name_processed = True
1416 if self.index_names is None:
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _get_index_name(self, columns)
1886 # Case 2
1887 (index_name, columns_,
-> 1888 self.index_col) = _clean_index_names(columns, self.index_col)
1889
1890 return index_name, orig_names, columns
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _clean_index_names(columns, index_col)
2171 break
2172 else:
-> 2173 name = cp_cols[c]
2174 columns.remove(name)
2175 index_names.append(name)
TypeError: list indices must be integers, not Index
OAuth2按预期工作,我只使用这些参数作为演示变量 - 查询本身就是垃圾。基本上,我无法弄清楚错误的来源,并会欣赏任何可能有的指针。
谢谢!
解决方案(SORT OF)
不确定这是否与我试图访问的数据有什么关系,但是我从pandas.io.ga.GDataReader中的index_col变量中得到了令人讨厌的索引类型错误.get_data()的类型为pandas.core.index.Index。这被送到_parse_data()中的pandas.io.parsers._read()中。我不明白这一点,但这对我来说是一个突破点。
作为修复 - 如果其他人遇到此问题 - 我已将ga.py第270行编辑为:
index_col = _clean_index(list(dimensions), parse_dates).tolist()
现在一切都很顺利,但我怀疑这可能会在其他情况下破坏......
答案 0 :(得分:1)
不幸的是,这个模块并没有真正记录,错误并不总是有意义的。包括您的account_name
,property_name
和profile_name
(profile_name
是在线版本中的View
。然后添加您感兴趣的一些dimensions
和metrics
。还要确保client_secrets.json
位于pandas.io
目录中。一个例子:
ga.read_ga(account_name=account_name,
property_name=property_name,
profile_name=profile_name,
dimensions=['date', 'hour', 'minute'],
metrics=['pageviews'],
start_date=start_date,
end_date=end_date,
index_col=0,
parse_dates={'datetime': ['date', 'hour', 'minute']},
date_parser=lambda x: datetime.strptime(x, '%Y%m%d %H %M'),
max_results=max_results)
另请查看我最近一步一步blog post关于使用pandas的GA。