Question

我有一个Twitter数据集，我试图用熊猫分析，但我无法弄清楚如何转换（例如“2天”，“24小时”或“2个月”，“5年” ）到日期时间格式。

我使用了以下代码：

for i df_merge['last_tweet']:
    n = i['last_tweet'].split(" ") [0]
    d =  i['last_tweet'].split(" ") [1]
if d in ["years", "year"]:
    n_days = n*365
elif d in ["months", "month"]:
    n_days = n*30

Answer 1

你可能想写一个辅助函数......

import numpy as np
import pandas as pd

def ym2nptimedelta(delta):
    delta_cfg = {
        'month': 'M',
        'months': 'M',
        'year': 'Y',
        'years': 'Y'
    }
    n, item = delta.lower().split()
    return np.timedelta64(n, delta_cfg.get(item))

print(pd.datetime.today() - pd.Timedelta('2 days'))
print(pd.datetime.today() - pd.Timedelta('24 hours'))
print(pd.to_datetime(pd.datetime.now()) - ym2nptimedelta('2 years'))
print(pd.to_datetime(pd.datetime.now()) - ym2nptimedelta('5 years'))

输出：

2016-03-08 20:39:34.315969
2016-03-09 20:39:34.315969
2014-03-11 09:01:10.316969
2011-03-11 15:33:34.317969

UPDATE1 （此辅助函数将处理所有可接受的numpy time-delta）：

import numpy as np
import pandas as pd

def deltastr2date(delta):
    delta_cfg = {
        'year': 'Y',
        'years': 'Y',
        'month': 'M',
        'months': 'M',
        'week': 'W',
        'weeks': 'W',
        'day': 'D',
        'days': 'D',
        'hour': 'h',
        'hours': 'h',
        'min': 'm',
        'minute': 'm',
        'minutes': 'm',
        'sec': 's',
        'second': 's',
        'seconds': 's',
    }
    n, item = delta.lower().split()
    return pd.to_datetime(pd.datetime.now()) - np.timedelta64(n, delta_cfg.get(item))

print(deltastr2date('2 days'))
print(deltastr2date('24 hours'))
print(deltastr2date('2 years'))
print(deltastr2date('5 years'))
print(deltastr2date('1 week'))
print(deltastr2date('10 hours'))
print(deltastr2date('45 minutes'))

输出：

2016-03-08 20:50:01.701853
2016-03-09 20:50:01.702853
2014-03-11 09:11:37.702853
2011-03-11 15:44:01.703853
2016-03-03 20:50:01.704854
2016-03-10 10:50:01.705854
2016-03-10 20:05:01.705854

UPDATE2 （显示如何将辅助函数应用于DF列）：

import numpy as np
import pandas as pd

def deltastr2date(delta):
    delta_cfg = {
        'year': 'Y',
        'years': 'Y',
        'month': 'M',
        'months': 'M',
        'week': 'W',
        'weeks': 'W',
        'day': 'D',
        'days': 'D',
        'hour': 'h',
        'hours': 'h',
        'min': 'm',
        'minute': 'm',
        'minutes': 'm',
        'sec': 's',
        'second': 's',
        'seconds': 's',
    }
    n, item = delta.lower().split()
    return pd.to_datetime(pd.datetime.now()) - np.timedelta64(n, delta_cfg.get(item))

N = 20

dt_units = ['seconds','minutes','hours','days','weeks','months','years']

# generate random list of deltas
deltas = ['{0[0]} {0[1]}'.format(tup) for tup in zip(np.random.randint(1,5,N), np.random.choice(dt_units, N))]

df = pd.DataFrame({'delta': pd.Series(deltas)})

# add new column 
df['last_tweet_dt'] = df['delta'].apply(deltastr2date)
print(df)

输出：

        delta              last_tweet_dt
0     3 hours 2016-03-10 20:32:49.252525
1      4 days 2016-03-06 23:32:49.252525
2   3 seconds 2016-03-10 23:32:46.253525
3     1 weeks 2016-03-03 23:32:49.253525
4   1 minutes 2016-03-10 23:31:49.253525
5   2 minutes 2016-03-10 23:30:49.253525
6      4 days 2016-03-06 23:32:49.254525
7     1 years 2015-03-11 17:43:37.254525
8   2 seconds 2016-03-10 23:32:47.254525
9   3 minutes 2016-03-10 23:29:49.254525
10    1 hours 2016-03-10 22:32:49.255525
11  2 seconds 2016-03-10 23:32:47.255525
12  3 minutes 2016-03-10 23:29:49.255525
13   3 months 2015-12-10 16:05:31.255525
14    4 weeks 2016-02-11 23:32:49.256526
15   3 months 2015-12-10 16:05:31.256526
16    4 hours 2016-03-10 19:32:49.256526
17    1 years 2015-03-11 17:43:37.256526
18    2 years 2014-03-11 11:54:25.257526
19  1 minutes 2016-03-10 23:31:49.257526

使用pandas将字符串转换为datetime值

1 个答案: