Question

已为我提供了一个日期数据集，该数据集的日期为2019年5月，格式为52019。将其放入Pandas DataFrame中，我需要将该日期格式提取为month列和year列，但我不知道如何对int64数据类型执行此操作或如何处理两位数月的数据。所以我想采取类似的方式

ID    Date
1    22019
2    32019
3    52019
5    102019

并使其成为

ID    Month    Year
1     2        2019
2     3        2019
3     5        2019
5     10       2019

我该怎么办？

Answer 1

`divmod`

df['Month'], df['Year'] = np.divmod(df.Date, 10000)

df

   ID    Date  Month  Year
0   1   22019      2  2019
1   2   32019      3  2019
2   3   52019      5  2019
3   5  102019     10  2019

无需使用assign来更改原始数据帧

df.assign(**dict(zip(['Month', 'Year'], np.divmod(df.Date, 10000))))

   ID    Date  Month  Year
0   1   22019      2  2019
1   2   32019      3  2019
2   3   52019      5  2019
3   5  102019     10  2019

Answer 2

使用：

s=pd.to_datetime(df.pop('Date'),format='%m%Y') #convert to datetime and pop deletes the col
df['Month'],df['Year']=s.dt.month,s.dt.year #extract month and year
print(df)

   ID  Month  Year
0   1      2  2019
1   2      3  2019
2   3      5  2019
3   5     10  2019

Answer 3

str.extract可以解决弄清月份是1位还是2位数字的难题。

(df['Date'].astype(str)
           .str.extract(r'^(?P<Month>\d{1,2})(?P<Year>\d{4})$')
           .astype(int))                              

   Month  Year
0      2  2019
1      3  2019
2      5  2019
3     10  2019

如果可以保证您的数字只有5或6位数字，也可以使用字符串切片（如果没有，请使用上面的str.extract）

u = df['Date'].astype(str)
df['Month'], df['Year'] = u.str[:-4], u.str[-4:]
df                                                                                                                    

   ID    Date Month  Year
0   1   22019     2  2019
1   2   32019     3  2019
2   3   52019     5  2019
3   5  102019    10  2019

Answer 4

使用//和%

df['Month'], df['Year'] = df.Date//10000,df.Date%10000
df
Out[528]: 
   ID    Date  Month  Year
0   1   22019      2  2019
1   2   32019      3  2019
2   3   52019      5  2019
3   5  102019     10  2019

将int64 Pandas列拆分为两个

4 个答案:

`divmod`