我有一个日期和字母列表。我要找到一周内发生的字母数量。我试图用字母表将它们分组并用“1w”重新采样。但我得到一些包含MultiIndex的奇怪数据框。我如何才能完成所有这些并获得包含得分,新重新抽样日期和计数的三行DataFrame?
PS:我正在寻找的是一周,并计算该周每个字母表的出现次数。 类似的东西
datetime alphabet count
2016-12-27 22:57:45.407246 a 1
2016-12-30 22:57:45.407246 a 2
2017-01-02 22:57:45.407246 a 0
2016-12-27 22:57:45.407246 b 0
2016-12-30 22:57:45.407246 b 1
2017-01-02 22:57:45.407246 b 4
2016-12-27 22:57:45.407246 c 7
2016-12-30 22:57:45.407246 c 0
2017-01-02 22:57:45.407246 c 0
这是代码
import random
import pandas as pd
import datetime
def randchar(a, b):
return chr(random.randint(ord(a), ord(b)))
# Create a datetime variable for today
base = datetime.datetime.today()
# Create a list variable that creates 365 days of rows of datetime values
date_list = [base - datetime.timedelta(days=x) for x in range(0, 365)]
score_list =[randchar('a', 'h') for i in range(365)]
df = pd.DataFrame()
# Create a column from the datetime variable
df['datetime'] = date_list
# Convert that column into a datetime datatype
df['datetime'] = pd.to_datetime(df['datetime'])
# Set the datetime column as the index
df.index = df['datetime']
# Create a column from the numeric score variable
df['score'] = score_list
df_s = tt = df.groupby('score').resample('1w').count()
答案 0 :(得分:1)
您可以应用groupby
+ count
和2个谓词 -
pd.Grouper
,频率为一周score
专栏最后,unstack
结果。
df = df.groupby([pd.Grouper(freq='1w'), 'score']).count().unstack(fill_value=0)
df.head()
datetime
score a b c d e f g h
datetime
2016-12-25 0 0 1 1 0 1 0 1
2017-01-01 1 0 0 1 3 0 2 0
2017-01-08 0 3 1 1 1 0 0 1
2017-01-15 1 2 0 2 0 0 1 1
2017-01-22 0 1 2 1 1 2 0 0