我一直致力于一个项目,并且需要填写“向量”月份(以制作直方图:每月推文数量的概述)。为了填写矢量月,我编写了以下代码:
numTweets = list(tweets_cleaned_panda.iloc[:,1])
months = [0, 0, 0, 0, 0, 0, 0]
for i in range(0,len(numTweets)+1):
if tweets_cleaned_panda['created_at'].str.contains("Mar") or tweets_cleaned_panda['created_at'].str.contains("Apr"):
months[0] = months[0] + 1
elif tweets_cleaned_panda['created_at'].str.contains("May"):
months[1] += 1
elif tweets_cleaned_panda['created_at'].str.contains("Jun"):
months[2] += 1
elif tweets_cleaned_panda['created_at'].str.contains("Jul"):
months[3] += 1
elif tweets_cleaned_panda['created_at'].str.contains("Aug"):
months[4] += 1
elif tweets_cleaned_panda['created_at'].str.contains("Sept"):
months[5] += 1
else:
months[6] += 1
print months
我尝试将.any()附加到contains()语句的末尾,但它只填充month [0]。
此外,我写了以下代码:
for i in range(0,len(numTweets)+1):
np.where(tweets_cleaned_panda['created_at'].str.contains("Mar"),
months[0] = months[0] + 1,
np.where(tweets_cleaned_panda['created_at'].str.contains("Apr"),
months[0] = months[0] + 1,
np.where(tweets_cleaned_panda['created_at'].str.contains("May"),
months[1] = months[1] + 1,
np.where(tweets_cleaned_panda['created_at'].str.contains("Jun"),
months[2] = months[2] + 1,
np.where(tweets_cleaned_panda['created_at'].str.contains("Jul"),
months[3] = months[3] + 1,
np.where(tweets_cleaned_panda['created_at'].str.contains("Aug"),
months[4] = months[4] + 1,
np.where(tweets_cleaned_panda['created_at'].str.contains("Sept"),
months[5] = months[5] + 1,
np.where(tweets_cleaned_panda['created_at'].str.contains("Oct"),
months[6] =months[]+ 1))))))))
但是这给出了以下错误:
SyntaxError:keyword不能是表达式 文件“”,第10行 月[0] =月[0] + 1, SyntaxError:keyword不能是表达式
任何有帮助的人?
答案 0 :(得分:1)
pandas与datetime数据配合得非常好。使用pd.to_datetime
函数可以转换UTC格式的时间:
pd.to_datetime("Wed Aug 27 13:08:45 +0000 2008")
Out Timestamp('2008-08-27 13:08:45')
如果您首先将该列转换为:
df['created_at'] = pd.to_datetime(df['created_at'])
然后您可以在月份列中使用.dt
访问者:
df['month'] = df['created_at'].dt.month
要从中获得频率分布,您只需拨打value_counts
:
df['month'].value_counts()
注意:您需要将df
替换为您的DataFrame的名称(tweets_cleaned_panda
)。