Question

您好我正在编写一个python脚本来生成网页的每月和每日访问次数。输入文件：

ArticleName Date        Hour    Count/Visit
Aa   20130601    10000   1
Aa   20130601    10000   1
Ew   20130601    10000   1
H    20130601    10000   2
H    20130602    10000   1
R    20130601    20000   2
R    20130602    10000   1
Ra   20130601    0   1
Ra   20130601    10000   2
Ra   20130602    10000   1
Ram  20130601    0   2
Ram  20130601    10000   3
Ram  20130602    10000   4
Re   20130601    20000   1
Re   20130602    10000   3
Rz   20130602    10000   1

我需要计算每个页面的每月和每日总页面浏览量。

输出：

ArticleName Date     DailyView MonthlyView
Aa   20130601 2 2
Ew   20130601 1 1
H    20130601 2 2
H    20130602 1 3
R    20130601 2 2
R    20130602 1 4
Ra   20130601 5 5
Ra   20130602 1 6
Ram  20130601 5 5
Ram  20130602 4 9
Re   20130601 1 1
Re   20130602 3 4
Rz   20130602 1 1

我的剧本：

#!/usr/bin/python

import sys

last_date = 20130601
last_hour = 0
last_count = 0
last_article = None
monthly_count = 0
daily_count = 0

for line in sys.stdin:
  article, date, hour, count = line.split()
  count = int(count)
  date = int(date)
  hour = int(hour)

  #Articles match and date match
  if last_article == article and last_date == date:
      daily_count = count+last_count
      monthly_count = count+last_count
      # print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)
  #Article match but date doesn't match 
  if last_article == article and last_date != date:
          monthly_count = count
          daily_count=count
          print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)


  #Article doesn't match
  if last_article != article:
          last_article = article
          last_count = count
          monthly_count = count
          daily_count=count
          last_date = date
          print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)

我能够获得大部分输出但是我的输出在两个条件下是错误的： 1.如果ArticleName和ArticleDate相同，则无法总结ArticleName。例如，此脚本为行Ra提供输出： Ra 20130601 1 1 Ra 20130601 3 3 Ra 20130602 1 1 因此，最后Ra应该打印1 + 3 + 1 = 5作为最终总月计数而不是1。

由于我在第3个条件中显示所有与上一篇文章不相同的文章，我得到具有相同文章名称和日期两次的文章的价值。喜欢：Ra 20130601 1 1不应该打印出来。有人知道如何纠正这个问题吗？如果您需要更多信息，请与我们联系。

Answer 1

请尝试以下操作：

import itertools
import operator
import sys

lines = (line.split() for line in sys.stdin)
prev_name, prev_month = '', '99999999'
month_view = 0
for (name,date), grp in itertools.groupby(lines, key=operator.itemgetter(0,1)):
    view = sum(int(row[-1]) for row in grp)
    if prev_name == name and date.startswith(prev_month):
        month_view += view
    else:
        prev_name = name
        prev_month = date[:6]
        month_view = view
    print '{}\t{}\t{}\t{}'.format(name, date, view, month_view)

已使用itertools.groupby，operator.itemgetter。

输出不同：

Aa      20130601        2       2
Ew      20130601        1       1
H       20130601        2       2
H       20130602        1       3
R       20130601        2       2
R       20130602        1       3
Ra      20130601        3       3
Ra      20130602        1       4
Ram     20130601        5       5
Ram     20130602        4       9
Re      20130601        1       1
Re      20130602        3       4
Rz      20130602        1       1

Answer 2

实现目标的更好方法是使用map - reduce函数在itertools中找到：http://docs.python.org/2/howto/functional.html

import itertools
from itertools import groupby
from itertools import dropwhile
import sys
import datetime

# Convert list of words found in one line into
# a tuple consisting of a name, date/time and number of visits
def get_record(w):
    name = w[0]
    date = datetime.datetime.strptime((w[1] + ('%0*d' % (6, int(w[2])))), "%Y%m%d%H%M%S")
    visits = int(w[3])
    return (name, date, visits)

# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year and month on which the records will
# be grouped.
def get_key_by_month((name, date, visits)):
    return (name, date.year, date.month)

# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year, month and day on which the records will
# be grouped.
def get_key_by_day((name, date, visits)):
    return (name, date.year, date.month, date.day)

# Get a list containing lines, each line containing
# a list of words, skipping the first line
words = (line.split() for line in sys.stdin)
words = dropwhile(lambda x: x[0]<1, enumerate(words))
words = map(lambda x: x[1], words)

# Convert to tuples containg name, date/time and count 
records = list(get_record(w) for w in words)

# Group by name, month
groups = groupby(records, get_key_by_month)

# Sum visits in each group
print('Visits per month')
for (name, year, month), g in groups:
    visits = sum(map(lambda (name,date,visits): visits, g))
    print name, year, month, visits

# Group by name, day
groups = groupby(records, get_key_by_day)

# Sum visits in each group
print ('\nVisits per day')
for (name, year, month, day), g in groups:
    visits = sum(map(lambda (name,date,visits): visits, g))
    print name, year, month, day, visits

以上代码的Python 3版本：

import itertools
from itertools import groupby
from itertools import dropwhile
import sys
import datetime

# Convert list of words found in one line into
# a tuple consisting of a name, date/time and number of visits
def get_record(w):
    name = w[0]
    date = datetime.datetime.strptime((w[1] + ('%0*d' % (6, int(w[2])))), "%Y%m%d%H%M%S")
    visits = int(w[3])
    return (name, date, visits)

# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year and month on which the records will
# be grouped.
def get_key_by_month(rec):
    return (rec[0], rec[1].year, rec[1].month)

# Takes a tuple representing a single record and returns a tuple
# consisting of a name, year, month and day on which the records will
# be grouped.
def get_key_by_day(rec):
    return (rec[0], rec[1].year, rec[1].month, rec[1].day)

# Get a list containing lines, each line containing
# a list of words, skipping the first line
words = (line.split() for line in sys.stdin)
words = dropwhile(lambda x: x[0]<1, enumerate(words))
words = map(lambda x: x[1], words)

# Convert to tuples containg name, date/time and count 
records = list(get_record(w) for w in words)

# Group by name, month
groups = groupby(records, get_key_by_month)

# Sum visits in each group
print('Visits per month')
for (name, year, month), g in groups:
    visits = sum(map(lambda rec: rec[2], g))
    print(name, year, month, visits)

# Group by name, day
groups = groupby(records, get_key_by_day)

# Sum visits in each group
print ('\nVisits per day')
for (name, year, month, day), g in groups:
    visits = sum(map(lambda rec: rec[2], g))
    print(name, year, month, day, visits)

Answer 3

执行此操作的简单方法是使用页面名称作为键构建双字典，并且值是从日期到视图数量的字典，迭代列表并构建字典，然后遍历字典页面并计算每个月的页数。

Python脚本不能按要求工作

3 个答案: