我有一个'Date'
列的数据框。我想将其转换为包含所有月份Jan
到Dec
的分类列。但是,我的专栏很多都没有代表。
考虑数据框df
df = pd.DataFrame(dict(Date=pd.date_range('2011-03-31', periods=4, freq='Q')))
df
Date
0 2011-03-31
1 2011-06-30
2 2011-09-30
3 2011-12-31
我试过
df.Date.dt.strftime('%b').astype('category')
0 Mar
1 Jun
2 Sep
3 Dec
Name: Date, dtype: category
Categories (4, object): [Dec, Jun, Mar, Sep]
您可以看到我的列中只有四个月表示为类别。我怎么去
0 Mar
1 Jun
2 Sep
3 Dec
Name: Date, dtype: category
Categories (12, object): [Jan, Feb, Mar, Apr, ..., Sep, Oct, Nov, Dec]
答案 0 :(得分:1)
在我看来,您需要参数categories
:
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
print (df.Date.dt.strftime('%b').astype('category', categories=cats))
0 Mar
1 Jun
2 Sep
3 Dec
Name: Date, dtype: category
Categories (12, object): [Jan, Feb, Mar, Apr, ..., Sep, Oct, Nov, Dec]
答案 1 :(得分:1)
您可以手动设置类别。
months = ['Jan', 'Feb', 'Mar', 'Apr',
'May', 'Jun', 'Jul', 'Aug',
'Sep', 'Oct', 'Nov', 'Dec']
df['Months'] = df.Date.dt.strftime('%b').astype('category')
df['Months'] = df['Months'].cat.set_categories(months)
答案 2 :(得分:1)
您可以使用Matcher bat = Pattern.compile("((?:\\/)?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?)\\s+(?:lbw)?(?:not\\sout)?(?:run\\sout)?\\s?(?:\\(((?:[A-Za-z']+)?\\s?(?:['A-Za-z]+)?)\\))?(?:(?:st\\s)?\\s?(?:((?:['A-Za-z]+)\\s(?:['A-Za-z]+)?)))?(?:c(?:\\.)?\\s((?:(?:['A-Za-z]+)?\\s(?:[A-Za-z']+)?)?(?:&)?))?\\s+(?:b\\.)?\\s+((?:[A-Za-z']+)\\s(?:[A-Za-z']+)?)?\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)").matcher(batting.group(1));
while (bat.find()) {
batPos++;
Batsman a = new Batsman(bat.group(1).replace("\n", "").replace("\r", "").replace("S/R", "").replace("/R", "").trim(), batting.group(2));
if (bat.group(0).contains("not out")) {
a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), false);
} else {
a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), true);
}
if (!teams.contains(batting.group(2))) {
teams.add(batting.group(2));
}
boolean f = true;
Batsman clone = null;
for (Batsman b1 : batted) {
if (b1.eq(a)) {
clone = b1;
f = false;
break;
}
}
if (!f) {
if (bat.group(0).contains("not out")) {
clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), false, true);
} else {
clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), true, true);
}
} else {
batted.add(a);
}
}
并使用pd.Categorical
参数手动设置类别:
categories
cat = pd.date_range('2011-01-1', periods=12, freq='M').strftime('%b')
out = pd.Categorical(df.Date.dt.strftime('%b'), categories=cat)
out