我正在处理与两个不同关键字相关的Twitter数据。我想计算每个关键字每天的推文数量,在折线图中可视化结果,然后将其与一系列重要日历的日期重叠事件。
我的目的是查看有关特定事件的推文计数是否发生变化。我已经计算并可视化了这些推文,但是在弄清楚如何覆盖关键日期时遇到了问题。
我尝试将重要的日期放入列表中,但是这引发了错误。谁能给我一些指导或建议一种更好的方法来解决这个问题?
这是一张图像,它大致说明了我要实现的目标: https://imgur.com/a/36esk1B
dates_list = ['2016-06-16','2016-06-23', '2016-06-24',
'2016-07-02', '2016-07-13']
#then convert list into a Series
key_dates = pd.Series(pd.to_datetime(dates_list))
# add columns to identify important events, and mark a 0 or 1.
tweet_trend['Important Events'] = False
tweet_trend.loc[key_dates, 'Important Events'] = True
tweet_trend['values'] = 0
tweet_trend.loc[key_dates, 'values'] = 1
KeyError Traceback (most recent call last)
<ipython-input-88-04dd081adc28> in <module>
10 # add columns to identify important events, and mark a 0 or 1.
11 tweet_trend['Important Events'] = False
---> 12 tweet_trend.loc[key_dates, 'Important Events'] = True
13 tweet_trend['values'] = 0
14 tweet_trend.loc[key_dates, 'values'] = 1
~/venv/lib/python3.6/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
187 else:
188 key = com.apply_if_callable(key, self.obj)
--> 189 indexer = self._get_setitem_indexer(key)
190 self._setitem_with_indexer(indexer, value)
191
~/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _get_setitem_indexer(self, key)
165 if isinstance(key, tuple):
166 try:
--> 167 return self._convert_tuple(key, is_setter=True)
168 except IndexingError:
169 pass
~/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _convert_tuple(self, key, is_setter)
246 if i >= self.obj.ndim:
247 raise IndexingError('Too many indexers')
--> 248 idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)
249 keyidx.append(idx)
250 return tuple(keyidx)
~/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _convert_to_indexer(self, obj, axis, is_setter, raise_missing)
1352 kwargs = {'raise_missing': True if is_setter else
1353 raise_missing}
-> 1354 return self._get_listlike_indexer(obj, axis, **kwargs)[1]
1355 else:
1356 try:
~/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1159 self._validate_read_indexer(keyarr, indexer,
1160 o._get_axis_number(axis),
-> 1161 raise_missing=raise_missing)
1162 return keyarr, indexer
1163
~/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1250 if not(self.name == 'loc' and not raise_missing):
1251 not_found = list(set(key) - set(ax))
-> 1252 raise KeyError("{} not in index".format(not_found))
1253
1254 # we skip the warning on Categorical/Interval
KeyError: "[Timestamp('2016-06-16 00:00:00')] not in index"
答案 0 :(得分:0)
您可以使用Index.isin
进行测试成员资格,然后将列强制转换为整数,以实现1/0
到Series
的映射,也不必转换为dates_list = ['2016-06-16','2016-06-23', '2016-06-24',
'2016-07-02', '2016-07-13']
key_dates = pd.to_datetime(dates_list)
tweet_trend['Important Events'] = df.index.isin(key_dates)
tweet_trend['values'] = tweet_trend['Important Events'].astype(int)
:
url