Question

我正在调整时间序列可视化代码here，以便不是按日期绘制频率图，而是希望频率与时间相关（忽略日期）。

所以我想算一下让我们说3个月，上午12点发送了多少推文，上午12:01发了多少推文，上午12:02发了多少推文等。

我尝试了以下内容：

dates_TweetText = []
for line in fileinput.input(glob.glob("*.txt")):
    tweet = json.loads(line)
    # track when the word is mentioned
    if s in preprocess(tweet['text']):
        import datetime, pytz
        thedate=datetime.datetime.strptime(tweet['created_at'],'%a %b %d %H:%M:%S +0000 %Y').replace(tzinfo=pytz.UTC)
        thetime=datetime.time(thedate.hour,thedate.minute)
        dates_TweetText.append(thetime)
# a list of "1" to count the hashtags
ones = [1]*len(dates_TweetText)
idx = pd.DatetimeIndex(dates_TweetText)

在最后一行崩溃，错误为TypeError: object of type 'datetime.time' has no len()

日期和时间都看起来不错，例如2015-11-18 19:33:15+00:00 and 19:33:00

这是因为Pandas要求日期和时间不会这样吗？还有另一种方法可以做24小时的情节吗？ TIA !!!

编辑：按照EdChum的建议，我试过这个：

dates_TweetText = []
for line in fileinput.input(glob.glob("*.txt")):
    print 'file', fileinput.filename(), line
    tweet = json.loads(line)
    # track when the word is mentioned
    if s in preprocess(tweet['text']):
        dates_TweetText.append(tweet['created_at'])

# a list of "1" to count the hashtags
print type(dates_TweetText)
ones = [1]*len(dates_TweetText)
# the index of the series
idx = pd.DatetimeIndex(dates_TweetText)

虽然有效但下面的'resample'行崩溃了，当我尝试使用TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex时出现错误idx.time（没有.time运行就行，但只按日期给出了一个情节）

# the actual series (at series of 1s for the moment)
TweetText = pd.Series(ones, index=idx.time)
# Resampling / bucketing
return TweetText.resample('1Min', how='sum').fillna(0)

熊猫24小时时间序列

0 个答案: