创建具有不完整月份的月份名称的分类列的最佳方法

时间:2017-09-05 04:55:05

标签: python pandas

我有一个'Date'列的数据框。我想将其转换为包含所有月份JanDec的分类列。但是,我的专栏很多都没有代表。

考虑数据框df

df = pd.DataFrame(dict(Date=pd.date_range('2011-03-31', periods=4, freq='Q')))

df

        Date
0 2011-03-31
1 2011-06-30
2 2011-09-30
3 2011-12-31

我试过

df.Date.dt.strftime('%b').astype('category')

0    Mar
1    Jun
2    Sep
3    Dec
Name: Date, dtype: category
Categories (4, object): [Dec, Jun, Mar, Sep]

您可以看到我的列中只有四个月表示为类别。我怎么去

0    Mar
1    Jun
2    Sep
3    Dec
Name: Date, dtype: category
Categories (12, object): [Jan, Feb, Mar, Apr, ..., Sep, Oct, Nov, Dec]

3 个答案:

答案 0 :(得分:1)

在我看来,您需要参数categories

cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
print (df.Date.dt.strftime('%b').astype('category', categories=cats))

0    Mar
1    Jun
2    Sep
3    Dec
Name: Date, dtype: category
Categories (12, object): [Jan, Feb, Mar, Apr, ..., Sep, Oct, Nov, Dec]

答案 1 :(得分:1)

您可以手动设置类别。

months = ['Jan', 'Feb', 'Mar', 'Apr', 
          'May', 'Jun', 'Jul', 'Aug', 
          'Sep', 'Oct', 'Nov', 'Dec']
df['Months'] = df.Date.dt.strftime('%b').astype('category')
df['Months'] = df['Months'].cat.set_categories(months)

答案 2 :(得分:1)

您可以使用Matcher bat = Pattern.compile("((?:\\/)?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?)\\s+(?:lbw)?(?:not\\sout)?(?:run\\sout)?\\s?(?:\\(((?:[A-Za-z']+)?\\s?(?:['A-Za-z]+)?)\\))?(?:(?:st\\s)?\\s?(?:((?:['A-Za-z]+)\\s(?:['A-Za-z]+)?)))?(?:c(?:\\.)?\\s((?:(?:['A-Za-z]+)?\\s(?:[A-Za-z']+)?)?(?:&)?))?\\s+(?:b\\.)?\\s+((?:[A-Za-z']+)\\s(?:[A-Za-z']+)?)?\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)").matcher(batting.group(1)); while (bat.find()) { batPos++; Batsman a = new Batsman(bat.group(1).replace("\n", "").replace("\r", "").replace("S/R", "").replace("/R", "").trim(), batting.group(2)); if (bat.group(0).contains("not out")) { a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), false); } else { a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), true); } if (!teams.contains(batting.group(2))) { teams.add(batting.group(2)); } boolean f = true; Batsman clone = null; for (Batsman b1 : batted) { if (b1.eq(a)) { clone = b1; f = false; break; } } if (!f) { if (bat.group(0).contains("not out")) { clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), false, true); } else { clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), true, true); } } else { batted.add(a); } } 并使用pd.Categorical参数手动设置类别:

categories
cat = pd.date_range('2011-01-1', periods=12, freq='M').strftime('%b')
out = pd.Categorical(df.Date.dt.strftime('%b'), categories=cat)
out