您好我正在编写一个python脚本来生成网页的每月和每日访问次数。输入文件:
ArticleName Date Hour Count/Visit
Aa 20130601 10000 1
Aa 20130601 10000 1
Ew 20130601 10000 1
H 20130601 10000 2
H 20130602 10000 1
R 20130601 20000 2
R 20130602 10000 1
Ra 20130601 0 1
Ra 20130601 10000 2
Ra 20130602 10000 1
Ram 20130601 0 2
Ram 20130601 10000 3
Ram 20130602 10000 4
Re 20130601 20000 1
Re 20130602 10000 3
Rz 20130602 10000 1
我需要计算每个页面的每月和每日总页面浏览量。
输出:
ArticleName Date DailyView MonthlyView
Aa 20130601 2 2
Ew 20130601 1 1
H 20130601 2 2
H 20130602 1 3
R 20130601 2 2
R 20130602 1 4
Ra 20130601 5 5
Ra 20130602 1 6
Ram 20130601 5 5
Ram 20130602 4 9
Re 20130601 1 1
Re 20130602 3 4
Rz 20130602 1 1
我的剧本:
#!/usr/bin/python
import sys
last_date = 20130601
last_hour = 0
last_count = 0
last_article = None
monthly_count = 0
daily_count = 0
for line in sys.stdin:
article, date, hour, count = line.split()
count = int(count)
date = int(date)
hour = int(hour)
#Articles match and date match
if last_article == article and last_date == date:
daily_count = count+last_count
monthly_count = count+last_count
# print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)
#Article match but date doesn't match
if last_article == article and last_date != date:
monthly_count = count
daily_count=count
print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)
#Article doesn't match
if last_article != article:
last_article = article
last_count = count
monthly_count = count
daily_count=count
last_date = date
print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)
我能够获得大部分输出但是我的输出在两个条件下是错误的: 1.如果ArticleName和ArticleDate相同,则无法总结ArticleName。 例如,此脚本为行Ra提供输出: Ra 20130601 1 1 Ra 20130601 3 3 Ra 20130602 1 1 因此,最后Ra应该打印1 + 3 + 1 = 5作为最终总月计数而不是1。
Ra 20130601 1 1
不应该打印出来。
有人知道如何纠正这个问题吗?
如果您需要更多信息,请与我们联系。 答案 0 :(得分:1)
请尝试以下操作:
import itertools
import operator
import sys
lines = (line.split() for line in sys.stdin)
prev_name, prev_month = '', '99999999'
month_view = 0
for (name,date), grp in itertools.groupby(lines, key=operator.itemgetter(0,1)):
view = sum(int(row[-1]) for row in grp)
if prev_name == name and date.startswith(prev_month):
month_view += view
else:
prev_name = name
prev_month = date[:6]
month_view = view
print '{}\t{}\t{}\t{}'.format(name, date, view, month_view)
已使用itertools.groupby
,operator.itemgetter
。
输出不同:
Aa 20130601 2 2
Ew 20130601 1 1
H 20130601 2 2
H 20130602 1 3
R 20130601 2 2
R 20130602 1 3
Ra 20130601 3 3
Ra 20130602 1 4
Ram 20130601 5 5
Ram 20130602 4 9
Re 20130601 1 1
Re 20130602 3 4
Rz 20130602 1 1
答案 1 :(得分:1)
实现目标的更好方法是使用map - reduce函数在itertools中找到:http://docs.python.org/2/howto/functional.html
import itertools
from itertools import groupby
from itertools import dropwhile
import sys
import datetime
# Convert list of words found in one line into
# a tuple consisting of a name, date/time and number of visits
def get_record(w):
name = w[0]
date = datetime.datetime.strptime((w[1] + ('%0*d' % (6, int(w[2])))), "%Y%m%d%H%M%S")
visits = int(w[3])
return (name, date, visits)
# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year and month on which the records will
# be grouped.
def get_key_by_month((name, date, visits)):
return (name, date.year, date.month)
# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year, month and day on which the records will
# be grouped.
def get_key_by_day((name, date, visits)):
return (name, date.year, date.month, date.day)
# Get a list containing lines, each line containing
# a list of words, skipping the first line
words = (line.split() for line in sys.stdin)
words = dropwhile(lambda x: x[0]<1, enumerate(words))
words = map(lambda x: x[1], words)
# Convert to tuples containg name, date/time and count
records = list(get_record(w) for w in words)
# Group by name, month
groups = groupby(records, get_key_by_month)
# Sum visits in each group
print('Visits per month')
for (name, year, month), g in groups:
visits = sum(map(lambda (name,date,visits): visits, g))
print name, year, month, visits
# Group by name, day
groups = groupby(records, get_key_by_day)
# Sum visits in each group
print ('\nVisits per day')
for (name, year, month, day), g in groups:
visits = sum(map(lambda (name,date,visits): visits, g))
print name, year, month, day, visits
以上代码的Python 3版本:
import itertools
from itertools import groupby
from itertools import dropwhile
import sys
import datetime
# Convert list of words found in one line into
# a tuple consisting of a name, date/time and number of visits
def get_record(w):
name = w[0]
date = datetime.datetime.strptime((w[1] + ('%0*d' % (6, int(w[2])))), "%Y%m%d%H%M%S")
visits = int(w[3])
return (name, date, visits)
# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year and month on which the records will
# be grouped.
def get_key_by_month(rec):
return (rec[0], rec[1].year, rec[1].month)
# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year, month and day on which the records will
# be grouped.
def get_key_by_day(rec):
return (rec[0], rec[1].year, rec[1].month, rec[1].day)
# Get a list containing lines, each line containing
# a list of words, skipping the first line
words = (line.split() for line in sys.stdin)
words = dropwhile(lambda x: x[0]<1, enumerate(words))
words = map(lambda x: x[1], words)
# Convert to tuples containg name, date/time and count
records = list(get_record(w) for w in words)
# Group by name, month
groups = groupby(records, get_key_by_month)
# Sum visits in each group
print('Visits per month')
for (name, year, month), g in groups:
visits = sum(map(lambda rec: rec[2], g))
print(name, year, month, visits)
# Group by name, day
groups = groupby(records, get_key_by_day)
# Sum visits in each group
print ('\nVisits per day')
for (name, year, month, day), g in groups:
visits = sum(map(lambda rec: rec[2], g))
print(name, year, month, day, visits)
答案 2 :(得分:0)
执行此操作的简单方法是使用页面名称作为键构建双字典,并且值是从日期到视图数量的字典,迭代列表并构建字典,然后遍历字典页面并计算每个月的页数。