Question

目前我在pandas dataframe中有一系列字符串作为列，代表了＃yyyy-yyyy＆＃34;中的特定年份。格式例如＆＃34; 2004-2005＆＃34;是此列中的单个字符串值。

我想知道是否还有将字符串转换为类似 datetime 格式的内容。

这样做的目的是计算此列的值与＆＃34;年＆＃34;中的其他类似列之间的差异。例如类似于下面的内容：

col 1        col2        Answer(Total years)
2004-2005    2006-2007    3

注意：我想做的一种方法是将每年的字典映射到一个唯一的整数值，然后计算它们之间的差异。

虽然我想知道是否有更简单的方法。

Answer 1

看起来你在第2列减去了第1年的第一年的第一年。在这种情况下，我使用str.extract（和convert the result to a number）：

In [11]: pd.to_numeric(df['col 1'].str.extract('(\d{4})'))
Out[11]:
0    2004
Name: col 1, dtype: int64

In [12]: pd.to_numeric(df['col2'].str.extract('-(\d{4})')) - pd.to_numeric(df['col 1'].str.extract('(\d{4})'))
Out[12]:
0    3
dtype: int64

Answer 2

你和/或类似于日期时间对象的东西是什么意思。＆＃34;日期时间不是为了表示日期范围而设计的。

如果要创建一对日期时间对象，可以执行以下操作：

[datetime.datetime.strptime(x, '%Y') for x in '2005-2006'.split('-')]

或者，您可以尝试使用Pandas date_range对象，如果它更接近您想要的对象。

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.date_range.html

Answer 3

如果你想找到最低年份和最高年份之间的差异，这里有一个去吧

col1="2004-2005"
col2="2006-2007"
col1=col1.split("-") # make a list of the years in col1 ['2004', '2005']
col2=col2.split("-") # make a list of the years in col2 ['2006', '2007']
biglist=col1+col2 #add the two list
biglist.sort() #sort the list from lowest year to highest year
Answer=int(biglist[len(biglist)-1])-int(biglist[0]) #find the difference between lowest and highest year

年份范围到日期时间格式

3 个答案: