用熊猫按天分离数据集

时间:2018-10-15 20:13:42

标签: python pandas dataset

我有一个看起来像这样的数据集

"2018-05-30 21:26:43",20.61129150,-100.40933971
"2018-05-30 21:26:43",20.61127415,-100.41146822
"2018-06-02 21:56:12",21.15633228,-100.93766080
"2018-06-05 22:57:40",20.59734201,-100.38091286
"2018-06-05 22:57:40",20.59875096,-100.37821426
"2018-06-06 20:56:22",20.61278120,-100.38446619
"2018-06-06 20:56:22",20.59865452,-100.37827264
"2018-06-06 21:57:15",20.59862012,-100.37817348
"2018-06-06 21:57:15",20.59864713,-100.37821263
"2018-06-06 21:57:15",20.59862915,-100.37825902
"2018-06-07 15:54:29",20.61280757,-100.39768857
"2018-06-07 15:54:29",20.61276216,-100.39769379

我想将我的数据分成天组,以便我可以计算距离并得出每天的平均旅行距离。

我目前正在按日期列将其分隔,如下所示:

col_names = ['date', 'latitude', 'longitude']
df = pd.read_csv('marco.csv', names=col_names, sep=',', skiprows=1)

# merge
m = df.reset_index().merge(df.reset_index(), on='date')

但是我想按天将其分开,以便获得索引

2018-05-30, 2018-06-05, 2018-06-06, 2018-06-07

我将如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

正如Yuca所说,group by应该可以解决问题。我将创建一个名为“ day”的新列,该列仅包含时间戳记中的日期,按日期排序,按“日期”分组,然后计算每组中的行进距离。

import pandas as pd

a = pd.DataFrame(
    [["2018-05-30 21:26:43",20.61129150,-100.40933971],
    ["2018-05-30 21:26:43",20.61127415,-100.41146822],
    ["2018-06-02 21:56:12",21.15633228,-100.93766080],
    ["2018-06-05 22:57:40",20.59734201,-100.38091286]], 
    columns=['date', 'lat', 'lng'])

a['date'] = pd.to_datetime(a['date'])


a['day'] = a['date'].dt.date

b = a.groupby('day')

# Loop over the groups and do whatever calculation you need
for tup in b:
    group = tup[0]
    df = tup[1]
    print df['lat'].sum()