用于确定列表中不频繁的值的有效算法

时间:2014-05-10 23:58:00

标签: python algorithm list frequency

我正在构建一个测验应用程序,从一组问题中随机提出问题。但是,要求问题池仅限于用户尚未看到的问题。但是,如果用户已经看到了所有问题,那么算法应该"重置"并且只显示用户曾经看过的问题。也就是说,总是向用户展示他们从未见过的问题,或者如果他们已经看过所有这些问题,在向他们更频繁地展示问题之前,总是向他们展示他们不太经常看到的问题。

列表(L)的创建方式如下:列表(I)中的任何值可以存在一次或在列表中重复多次。让我们在列表J中定义另一个值,使其与I的值不同。然后0 <= abs(frequency(I) - frequency(J)) <= 1将始终为真。

换句话说:如果一个值在列表中重复5次,并且5次是列表中任何值重复的最大次数,则列表中的所有值将重复4或5次。该算法应返回列表中frequency == 4的所有值,然后再返回frequency == 5

对不起,这太详细了,我正在努力简洁地定义这个问题。请随时留下有问题的评论,如果需要,我将进一步获得资格。

提前感谢您提供的任何帮助。

澄清

感谢您提出的答案。我不认为他们中的任何一个还在那里。让我进一步解释一下。

我没有与用户互动并向他们提问。我将问题ID分配给考试记录,以便在用户开始考试时,确定他们可以访问的问题列表。因此,我有两个数据结构可供使用:

  • 用户有权访问的可能问题ID列表
  • 此用户之前曾分配过的所有问题ID的列表。这是上面描述的清单L.

因此,除非我弄错了,否则此问题的算法/解决方案需要使用上述两个列表来处理列表和/或基于集合的操作。

结果将是我可以与考试记录关联的问题ID列表,然后插入到数据库中。

5 个答案:

答案 0 :(得分:7)

用填充伪代码中的数据库内容重写。

如果我正确地理解了这个问题,我会将问题(或他们的ID作为代理人)视为卡片的实体卡片:对于每个用户,将卡片洗牌并一次处理一个问题;如果他们想要超过len(deck)个问题,请重新开始:将套牌改组为新订单并再次执行。当n时出现问题时,所有其他问题都会显示nn-1次。

为了跟踪用户可用的问题,我们将未使用的问题ID放回数据库中,并在需要新交易时递增“通过”计数器的次数。

类似的东西:

from random import shuffle

def deal():
    question_IDs = get_all_questions(dbconn) # all questions
    shuffle(question_IDs)
    increment_deal_count(dbconn, userID) # how often this student has gotten questions
    return question_IDs


count_deals = get_stored_deals(dbconn, userID) # specific to this user
if count_deals: 
    question_IDs = get_stored_questions(dbconn, userID) # questions stored for this user 
else: # If 0 or missing, this is the first time for this student
    question_IDs = deal()


while need_another_question(): #based on exam requirements
    try:
        id = question_IDs.pop()
    except IndexError:
        question_IDs = deal()
        id = question_IDs.pop() # Trouble if db is ever empty. 

    use_question(id) # query db with the ID, then put question in print, CMS, whatever

# When we leave that while loop, we have used at least some of the questions
# question_IDs lists the *unused* ones for this deal
# and we know how many times we've dealt.

store_in_db(dbconn, userinfo, question_IDs)
# If you want to know how many times a question has been available, it's
# count_deals - (ID in question_IDs)
# because True evaluates to 1 if you try to subtract it from an integer. 

答案 1 :(得分:5)

为什么不拥有两个列表,一个用于尚未挑选的问题,另一个用于已挑选的问题。最初,尚未挑选的列表将被填满,您将从中挑选元素,这些元素将被删除并添加到选定列表中。一旦尚未挑选的列表为空,重复上述相同的过程,这次使用完整选择列表作为尚未挑选的列表,反之亦然。

答案 2 :(得分:3)

要实现您的算法,您只需要对列表进行随机播放并通过它,完成后重复。

无需复制列表或在两个列表之间处理项目,只需使用以下控制流程,例如:

import random

def ask_questions(list_of_questions):
    while True:
        random.shuffle(list_of_questions)
        for question in list_of_questions:
            print(question)
            # Python 3 use input not raw_input
            cont = raw_input('Another question?') 
            if not cont:
                break
        if not cont:
            break

答案 3 :(得分:0)

让我们定义一个“透视”,将列表分成两部分。枢轴对阵列进行分区,使得在枢轴之前的所有数字都被选取比枢轴之后的数字多一个(或者更一般地,枢轴之前的所有数字都不适合于拾取,而枢轴之后的所有数字都有资格进行拾取)。

您只需从枢轴后的数字列表中选择一个随机项,将其与枢轴上的数字交换,然后递增轴。当数据透视到达列表末尾时,您可以将其重置回到开头。

或者,您也可以使用两个更容易实现的列表,但由于需要扩展/缩小列表,因此稍微效率较低。大多数时候,易于实施将胜过低效率,因此这两个列表通常是我的首选。

答案 4 :(得分:0)

这就是我提出的:

from collections import Counter
import random

# the number of question ids I need returned to
# assign to the exam
needed = 3

# the "pool" of possible question ids the user has access to
possible = [1,2,3,4,5]

# examples of lists of question ids I might see that represent
# questions a user has already answered
answered1 = []
answered2 = [1,3]
answered3 = [5,4,3,2]
answered4 = [5,4,3,2,1,1,2]
answered5 = [5,4,3,2,1,1,2,3,4,5,1]
answered6 = [5,4,3,2,1]

def getdiff(answered):
    diff = set(possible) - set(answered)
    still_needed = needed - len(diff)
    if still_needed > 0:
        not_already_selected = list(set(possible) - diff)
        random.shuffle(not_already_selected)
        diff = list(diff) + not_already_selected[0:still_needed]
        random.shuffle(diff)
        return diff
    diff = list(diff)
    random.shuffle(diff)
    if still_needed == 0:
        return diff
    return diff[0:needed]

def workit(answered):
    """ based on frequency, reduce the list down to only
        those questions we want to consider "answered"
    """
    have_count = 0
    if len(possible) > len(answered):
        return getdiff(answered)
    counted = Counter(answered)
    max_count = max(counted.values())
    # the key here is to think of "answered" questions as
    # only those that have been seen with max frequency
    new_answered = []
    for value, count in counted.iteritems():
        if count == max_count:
            new_answered.append(value)
    return getdiff(new_answered)

print 1, workit(answered1)
print 2, workit(answered2)
print 3, workit(answered3)
print 4, workit(answered4)
print 5, workit(answered5)
print 6, workit(answered6)

"""
>>> 
1 [2, 4, 3]
2 [2, 5, 4]
3 [5, 2, 1]
4 [5, 3, 4]
5 [2, 4, 3]
6 [2, 3, 5]
>>> ================================ RESTART ================================
>>> 
1 [3, 1, 4]
2 [5, 2, 4]
3 [2, 4, 1]
4 [5, 4, 3]
5 [4, 5, 3]
6 [1, 5, 3]
>>> ================================ RESTART ================================
>>> 
1 [1, 2, 3]
2 [4, 2, 5]
3 [4, 1, 5]
4 [5, 4, 3]
5 [2, 5, 4]
6 [2, 1, 4]
"""