ID ArCityArCountry DptCityDptCountry DateDpt DateAr
1922 ParisFrance NewYorkUnitedState 2008-03-10 2001-02-02
1002 LosAngelesUnitedState California UnitedState 2008-03-10 2008-12-01
1901 ParisFrance LagosNigeria 2001-03-05 2001-02-02
1922 ParisFrance NewYorkUnitedState 2011-02-03 2008-12-01
1002 ParisFrance CaliforniaUnitedState 2003-03-04 2002-03-04
1099 ParisFrance BeijingChina 2011-02-03 2009-02-04
1901 LosAngelesUnitedState ParisFrance 2001-03-05 2001-02-02
我想将它们分组为ParisFrance
,LosAngelesUnitedState
,然后DPTCITYDPTCOUNTRY
(相同),然后想要考虑日期(即DateAr
和{{1} })。
例如
DateDpt
[它应列出ParisFrance
,ID
,DateDpt
所有与DateAr
有关的内容,而无需重复编写ParisFrance
,但可以列出那些与它有关]
ParisFrance
[LosAngelesUnitedState
,ID
,DateDpt
列出DateAr
所有与LosAngelesUnitedState
无关但不重复LosAngelesUnitedState
的列表,但可以列出那些与它有关]]
import pandas as pd
import datetime
from pandas_datareader import data, wb
import csv
import numpy as np
out= open("testfile.csv", "rb")
data = csv.reader(out)
#df = pd.read_csv('testfile.csv')
data = [[row[0],row[1] + row[2],row[3] + row[4], row[5],row[6]] for row in data]
out.close()
print data
out=open("data.csv", "wb")
output = csv.writer(out)
for row in data:
output.writerow(row)
out.close()
df = pd.read_csv('data.csv')
for DateDpt, DateAr in df.iteritems():
df.DateDpt = pd.to_datetime(df.DateDpt, format='%Y-%m-%d')
df.DateAr = pd.to_datetime(df.DateAr, format='%Y-%m-%d')
print df
df[(df.DateAr <= df.DateDpt)]
.sort(['ID','DateAr','DateDpt'],
ascending[1,1,1,0])
.groupby(['DptCityDptCountry','ArCityArCountry'])
.first().reset_index()
期望的输出:
ParisFrance
[1922, NewYorkUnitedState, 2008-03-10, 2001-02-02], [1901,LagosNigeria, 2001-03-05 2001-02-02], [1922,NewYorkUnitedState,2011-02-03, 2008-12-01]
LosAngelesUnitedState
[1901,ParisFrance,2001-03-05, 2001-02-02]
答案 0 :(得分:0)
听起来像是在寻找类似的东西:
df['DateAr'] = pd.to_datetime(df['DateAr'])
df['DateDpt'] = pd.to_datetime(df['DateDpt'])
dept_cities = df.groupby('ArCityArCountry')
for city, departures in dept_cities:
print(city)
print([list(r) for r in departures.loc[:, ['ID', 'DptCityDptCountry', 'DateDpt', 'DateAr']].to_records()])
可让您接近您指明的格式 - 当然可以进一步调整print()
。
LosAngelesUnitedState
[[1, 1002, 'California UnitedState', numpy.datetime64('2008-03-09T18:00:00.000000000-0600'), numpy.datetime64('2008-11-30T18:00:00.000000000-0600')], [6, 1901, 'ParisFrance', numpy.datetime64('2001-03-04T18:00:00.000000000-0600'), numpy.datetime64('2001-02-01T18:00:00.000000000-0600')]]
ParisFrance
[[0, 1922, 'NewYorkUnitedState', numpy.datetime64('2008-03-09T18:00:00.000000000-0600'), numpy.datetime64('2001-02-01T18:00:00.000000000-0600')], [2, 1901, 'LagosNigeria', numpy.datetime64('2001-03-04T18:00:00.000000000-0600'), numpy.datetime64('2001-02-01T18:00:00.000000000-0600')], [3, 1922, 'NewYorkUnitedState', numpy.datetime64('2011-02-02T18:00:00.000000000-0600'), numpy.datetime64('2008-11-30T18:00:00.000000000-0600')], [4, 1002, 'CaliforniaUnitedState', numpy.datetime64('2003-03-03T18:00:00.000000000-0600'), numpy.datetime64('2002-03-03T18:00:00.000000000-0600')], [5, 1099, 'BeijingChina', numpy.datetime64('2011-02-02T18:00:00.000000000-0600'), numpy.datetime64('2009-02-03T18:00:00.000000000-0600')]]