假设我有两个看起来像这样的列表:
L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male', 'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']
L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male', 'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female', 'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male', 'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male', 'James, Chelsea, 2008, 3, Female']
我想用它做的是比较一个家庭中每个人(同一个姓氏)和每个家庭中“约翰”的日期。日期从包括年,月和日,到年和月,再到年。我想找到约翰的约会和他的每个家庭成员之间的差异到我能得到的最具体的一点(如果一个约会全部有3个部分而另一个只有月份和年份,那么只能找到几个月和几年的时差)。这是我到目前为止所尝试的,它没有用,因为它没有使用正确的名称和日期(它只给了每个约翰一个兄弟姐妹),它计算日期之间的时间方式令人困惑和错误:< / p>
for line in L1:
type=line.split(',')
if len(type)>=1:
family=type[0]
if len(type)==6:
yearA=type[2]
monthA=type[3]
dayA=type[4]
sex=type[5]
print '%s, John Published in %s, %s, %s, %s' %(family, yearA, monthA, dayA, sex)
elif len(type)==5:
yearA=type[2]
monthA=type[3]
sex=type[4]
print '%s, John Published in %s, %s, %s' %(family, yearA, monthA, sex)
elif len(type)==4:
yearA=type[2]
sex=type[3]
print '%s, John Published in %s, %s' %(family, yearA, sex)
for line in L2:
if re.search(family, line):
word=line.split(',')
name=word[1]
if len(word)==6:
yearB=word[2]
monthB=word[3]
dayB=word[4]
sex=word[5]
elif len(word)==5:
yearB=word[2]
monthB=word[3]
sex=word[4]
elif len(word)==4:
yearB=word[2]
sex=word[3]
if dayA and dayB:
yeardiff= int(yearA)-int(yearB)
monthdiff=int(monthA)-int(monthB)
daydiff=int(dayA)-int(dayB)
print'%s, %s Published %s year(s), %s month(s), %s day(s) before/after John, %s' %(family, name, yeardiff, monthdiff, daydiff, sex)
elif not dayA and not dayB and monthA and monthB:
yeardiff= int(yearA)-int(yearB)
monthdiff=int(monthA)-int(monthB)
print'%s, %s Published %s year(s), %s month(s), before/after John, %s' %(family, name, yeardiff, monthdiff, sex)
elif not monthA and not monthB and yearA and yearB:
yeardiff= int(yearA)-int(yearB)
print'%s, %s Published %s year(s), before/after John, %s' %(family, name, yeardiff, sex)
我想最终看到这样的东西,并且如果可能的话,允许程序区分兄弟姐妹是在之前还是之后出现的东西,并且只打印几个月和几天,如果它们同时出现在比较日期:
Smith, John Published in 2008, 12, 10, Male
Smith, Joy Published _ year(s) _month(s) _day(s) before/after John, Female
Smith, Kevin Published _ year(s) _month(s) _day(s) before/after John, Male
Smith, Matt Published _ year(s) _month(s) _day(s) before/after John, Male
Smith, Carol Published _ year(s) _month(s) _day(s) before/after John, Female
Smith, Sue Published _ year(s) _month(s) _day(s) before/after John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
Johnson, Alex Published _ year(s) _month(s) _day(s) before/after John, Male
Johnson, Emma Published _ year(s) _month(s) _day(s) before/after John, Female
James, John Published in 2008, 3, Male
James, Peter Published _ year(s) _month(s) _day(s) before/after John, Male
James, Chelsea Published _ year(s) _month(s) _day(s) before/after John, Female
答案 0 :(得分:6)
正如Joe Kington所说,dateutil module对此非常有用。 特别是,它可以告诉您两个日期之间的差异,包括年,月和日。 (自己进行计算将涉及考虑闰年等。使用经过充分测试的模块比重新发明这个轮子更好。)
这个问题适用于课程。
让我们制作一个人类来跟踪一个人的姓名,性别和出版日期:
class Person(object):
def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
self.lastname=lastname
self.firstname=firstname
self.ymd=VagueDate(year,month,day)
self.gender=gender
发布日期可能缺少数据,所以让我们创建一个特殊的类来处理缺少的日期数据:
class VagueDate(object):
def __init__(self,year=None,month=None,day=None):
self.year=year
self.month=month
self.day=day
def __sub__(self,other):
d1=self.asdate()
d2=other.asdate()
rd=relativedelta.relativedelta(d1,d2)
years=rd.years
months=rd.months if self.month and other.month else None
days=rd.days if self.day and other.day else None
return VagueDateDelta(years,months,days)
datetime
模块定义datetime.datetime
个对象,并使用datetime.timedelta
个对象来表示两个datetime.datetime
个对象之间的差异。类似地,让我们定义一个VagueDateDelta
来表示两个VagueDate
之间的差异:
class VagueDateDelta(object):
def __init__(self,years=None,months=None,days=None):
self.years=years
self.months=months
self.days=days
def __str__(self):
if self.days is not None and self.months is not None:
return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
elif self.months is not None:
return '{s.years} years, {s.months} months'.format(s=self)
else:
return '{s.years} years'.format(s=self)
现在我们已经为自己建立了一些方便的工具,解决问题并不难。
第一步是解析字符串列表并将它们转换为Person对象:
def parse_person(text):
data=map(str.strip,text.split(','))
lastname=data[0]
firstname=data[1]
gender=data[-1]
ymd=map(int,data[2:-1])
return Person(lastname,firstname,gender,*ymd)
johns=map(parse_person,L1)
peeps=map(parse_person,L2)
接下来,我们将peeps
重组为家庭成员的字典:
family=collections.defaultdict(list)
for person in peeps:
family[person.lastname].append(person)
最后,您只需遍历johns
和每个john
的家庭成员,比较发布日期,然后报告结果。
完整的脚本可能如下所示:
import datetime as dt
import dateutil.relativedelta as relativedelta
import pprint
import collections
class VagueDateDelta(object):
def __init__(self,years=None,months=None,days=None):
self.years=years
self.months=months
self.days=days
def __str__(self):
if self.days is not None and self.months is not None:
return '{s.years} years, {s.months} months, {s.days} days'.format(s=self)
elif self.months is not None:
return '{s.years} years, {s.months} months'.format(s=self)
else:
return '{s.years} years'.format(s=self)
class VagueDate(object):
def __init__(self,year=None,month=None,day=None):
self.year=year
self.month=month
self.day=day
def __sub__(self,other):
d1=self.asdate()
d2=other.asdate()
rd=relativedelta.relativedelta(d1,d2)
years=rd.years
months=rd.months if self.month and other.month else None
days=rd.days if self.day and other.day else None
return VagueDateDelta(years,months,days)
def asdate(self):
# You've got to make some kind of arbitrary decision when comparing
# vague dates. Here I make the arbitrary decision that missing info
# will be treated like 1s for the purpose of calculating differences.
return dt.date(self.year,self.month or 1,self.day or 1)
def __str__(self):
if self.day is not None and self.month is not None:
return '{s.year}, {s.month}, {s.day}'.format(s=self)
elif self.month is not None:
return '{s.year}, {s.month}'.format(s=self)
else:
return '{s.year}'.format(s=self)
class Person(object):
def __init__(self,lastname,firstname,gender=None,year=None,month=None,day=None):
self.lastname=lastname
self.firstname=firstname
self.ymd=VagueDate(year,month,day)
self.gender=gender
def age_diff(self,other):
return self.ymd-other.ymd
def __str__(self):
fmt='{s.lastname}, {s.firstname} ({s.gender}) ({d.year},{d.month},{d.day})'
return fmt.format(s=self,d=self.ymd)
__repr__=__str__
def __lt__(self,other):
d1=self.ymd.asdate()
d2=other.ymd.asdate()
return d1<d2
def parse_person(text):
data=map(str.strip,text.split(','))
lastname=data[0]
firstname=data[1]
gender=data[-1]
ymd=map(int,data[2:-1])
return Person(lastname,firstname,gender,*ymd)
def main():
L1=['Smith, John, 2008, 12, 10, Male', 'Bates, John, 2006, 1, Male',
'Johnson, John, 2009, 1, 28, Male', 'James, John, 2008, 3, Male']
L2=['Smith, Joy, 2008, 12, 10, Female', 'Smith, Kevin, 2008, 12, 10, Male',
'Smith, Matt, 2008, 12, 10, Male', 'Smith, Carol, 2000, 12, 11, Female',
'Smith, Sue, 2000, 12, 11, Female', 'Johnson, Alex, 2008, 3, Male',
'Johnson, Emma, 2008, 3, Female', 'James, Peter, 2008, 3, Male',
'James, Chelsea, 2008, 3, Female']
johns=map(parse_person,L1)
peeps=map(parse_person,L2)
print(pprint.pformat(johns))
print
print(pprint.pformat(peeps))
print
family=collections.defaultdict(list)
for person in peeps:
family[person.lastname].append(person)
# print(family)
pub_fmt='{j.lastname}, {j.firstname} Published in {j.ymd}, {j.gender}'
rel_fmt=' {r.lastname}, {r.firstname} Published {d} {ba} John, {r.gender}'
for john in johns:
print(pub_fmt.format(j=john))
for relative in family[john.lastname]:
diff=john.ymd-relative.ymd
ba='before' if relative<john else 'after'
print(rel_fmt.format(
r=relative,
d=diff,
ba=ba,
))
if __name__=='__main__':
main()
产量
[Smith, John (Male) (2008,12,10),
Bates, John (Male) (2006,1,None),
Johnson, John (Male) (2009,1,28),
James, John (Male) (2008,3,None)]
[Smith, Joy (Female) (2008,12,10),
Smith, Kevin (Male) (2008,12,10),
Smith, Matt (Male) (2008,12,10),
Smith, Carol (Female) (2000,12,11),
Smith, Sue (Female) (2000,12,11),
Johnson, Alex (Male) (2008,3,None),
Johnson, Emma (Female) (2008,3,None),
James, Peter (Male) (2008,3,None),
James, Chelsea (Female) (2008,3,None)]
Smith, John Published in 2008, 12, 10, Male
Smith, Joy Published 0 years, 0 months, 0 days after John, Female
Smith, Kevin Published 0 years, 0 months, 0 days after John, Male
Smith, Matt Published 0 years, 0 months, 0 days after John, Male
Smith, Carol Published 7 years, 11 months, 29 days before John, Female
Smith, Sue Published 7 years, 11 months, 29 days before John, Female
Bates, John Published in 2006, 1, Male
Johnson, John Published in 2009, 1, 28, Male
Johnson, Alex Published 0 years, 10 months before John, Male
Johnson, Emma Published 0 years, 10 months before John, Female
James, John Published in 2008, 3, Male
James, Peter Published 0 years, 0 months after John, Male
James, Chelsea Published 0 years, 0 months after John, Female
答案 1 :(得分:2)
正如评论中所述(在@Matt的答案中),您至少需要“年,月,日”才能使用datetime.date和datetime.timedelta。从上面的示例数据看,有些条目可能会丢失“日”,这使得它变得更加棘手。
如果你没有使用默认值数月/天(比如1月1日),那么你可以很快地将这些日期转换为datetime.date实例。
作为一个简单的例子:
johns = []
for s in L1:
# NOTE: not the most robust parsing method.
v = [x.strip() for x in s.split(",")]
data = {
"gender": v[-1],
"last_name": v[0],
"first_name": v[1],
}
# build keyword args for datetime.date()
v = v[2:-1] # remove parsed data
kwargs = { "year": int(v.pop(0)), "month": 1, "day":1 }
try:
kwargs["month"] = int(v.pop(0))
kwargs["day"] = int(v.pop(0))
except:
pass
data["date"] = date(**kwargs)
johns.append(data)
这会为您提供包含姓名,性别和日期的dict
列表。您可以对L2
执行相同的操作,通过从另一个date
中扣除>>> a = date(2008, 12,12)
>>> b = date(2010, 1, 13)
>>> delta = b - a
>>> print delta.days
397
>>> print "%d years, %d days" % divmod(delta.days, 365)
1 years, 32 days
来生成timedelta对象来计算日期差异。
divmod
我故意遗漏月,因为它不会像30天到一个月那么简单。可以说,如果考虑到闰年,假设一年365天同样不准确。
如果您需要按年,月和日显示增量,则timedelta
返回的from datetime import timedelta
def my_time_delta(d1,d2):
"""
Returns time delta as the following tuple:
("before|after|same", "years", "months", "days")
"""
if d1 == d2:
return ("same",0,0,0)
# d1 before or after d2?
if d1 > d2:
ba = "after"
d1,d2 = d2,d1 # swap so d2 > d1
else:
ba = "before"
years = d2.year - d1.year
months = d2.month - d1.month
days = d2.day - d1.day
# adjust for -ve days/months
if days < 0:
# get last day of month for month before d1
pre_d1 = d1 - timedelta(days=d1.day)
days = days + pre_d1.day
months = months - 1
if months < 0:
months = months + 12
years = years - 1
return (ba, years, months, days)
天可能不准确,因为没有考虑闰年和不同日期几个月您必须手动比较每个日期的每年,每月和每天。
这是我对这种功能的抨击。 (仅经过轻微测试,因此请谨慎使用)
>>> my_time_delta(date(2003,12,1), date(2003,11,2))
('after', 0, 0, 30)
>>> my_time_delta(date(2003,12,1), date(2004,11,2))
('before', 0, 11, 1)
>>> my_time_delta(date(2003,2,1), date(1992,3,10))
('after', 10, 10, 20)
>>> p,y,m,d = my_time_delta(date(2003,2,1), date(1992,3,10))
>>> print "%d years, %d months, %d days %s" % (y,m,d,p)
10 years, 10 months, 20 days after
使用示例:
{{1}}
答案 2 :(得分:0)
这种类型的东西可能存在现有的模块,但我会先将日期转换为常用的时间单位(即您的示例中的19XX年1月1日以来的日期)。然后,您可以轻松地比较它们,减去它们等等,然后您可以将它们转换回您认为适合显示的日期。如果天数符合您的要求,这应该相当容易。