我正在尝试根据熊猫中的现有专栏创建一个新专栏。现有列包含一年和季度。 IE:“ 201901”或为空白。如果原始列具有有效条目,则新列应包含完整的时间戳,否则为空。
IE:
输入:
201901
201902
无
201901所需的输出:
日期时间(2019,01,01)
datetime(2019,03,01)
无
日期时间(2019,01,01)
我的尝试
df['stamp'] = np.where(df['quarter'].astype(str).str.len() == 8,\
datetime( df['quarter'].astype(str).str[0:4].astype(int), \
df['quarter'].astype(str).str[4:6].astype(int)*3,1), \
None)
结果:
ValueError: invalid literal for int() with base 10: ''
在我看来,条件的True分支的代码正在评估所有行。注意:条件是正确的,它可以正确识别有效条目。
答案 0 :(得分:1)
pandas.to_datetime
will parse quarters automatically, but it needs to be in the format 2019Q3
, year followed by Q and the Quarter.
Since you have a column of integers with a None
it's difficult to know if the underlying values are truly integers, or if they are cast to float
, which could mess up the string slicing without first replacing.
import pandas as pd
s = df.stamp.astype(str).replace('\.0', '', regex=True) # Remove .replace if truly integer
pd.to_datetime(s.str[0:4] + 'Q' + s.str[-1], errors='coerce')
#0 2019-01-01
#1 2019-04-01
#2 NaT
#3 2019-01-01
#Name: stamp, dtype: datetime64[ns]
You get some garbage 'nanQn'
or 'NoneQe'
for the missing rows, but since it's going to become NaT
anyway probably not a big deal.
答案 1 :(得分:0)
您能做点什么...
df['stamp'] = df['quarter'].replace('', np.nan, inplace=True)
然后进行其他计算?