Question

我有一个日期和字母列表。我要找到一周内发生的字母数量。我试图用字母表将它们分组并用“1w”重新采样。但我得到一些包含MultiIndex的奇怪数据框。我如何才能完成所有这些并获得包含得分，新重新抽样日期和计数的三行DataFrame？

PS：我正在寻找的是一周，并计算该周每个字母表的出现次数。类似的东西

datetime                        alphabet   count
2016-12-27 22:57:45.407246         a      1
2016-12-30 22:57:45.407246         a      2
2017-01-02 22:57:45.407246         a      0
2016-12-27 22:57:45.407246         b      0
2016-12-30 22:57:45.407246         b      1
2017-01-02 22:57:45.407246         b      4
2016-12-27 22:57:45.407246         c      7
2016-12-30 22:57:45.407246         c      0
2017-01-02 22:57:45.407246         c      0

这是代码

import random
import pandas as pd
import datetime



def randchar(a, b):
    return chr(random.randint(ord(a), ord(b)))

# Create a datetime variable for today
base = datetime.datetime.today()
# Create a list variable that creates 365 days of rows of datetime values
date_list = [base - datetime.timedelta(days=x) for x in range(0, 365)]

score_list =[randchar('a', 'h') for i in range(365)]

df = pd.DataFrame()

# Create a column from the datetime variable
df['datetime'] = date_list
# Convert that column into a datetime datatype
df['datetime'] = pd.to_datetime(df['datetime'])
# Set the datetime column as the index
df.index = df['datetime']
# Create a column from the numeric score variable
df['score'] = score_list

df_s = tt = df.groupby('score').resample('1w').count()

Answer 1

您可以应用groupby + count和2个谓词 -

pd.Grouper，频率为一周
score专栏

最后，unstack结果。

df = df.groupby([pd.Grouper(freq='1w'), 'score']).count().unstack(fill_value=0)
df.head() 

           datetime                     
score             a  b  c  d  e  f  g  h
datetime                                
2016-12-25        0  0  1  1  0  1  0  1
2017-01-01        1  0  0  1  3  0  2  0
2017-01-08        0  3  1  1  1  0  0  1
2017-01-15        1  2  0  2  0  0  1  1
2017-01-22        0  1  2  1  1  2  0  0

在python pandas中查找每个日期的字母数

1 个答案: