机器学习面试编程测试

时间:2018-02-01 15:50:28

标签: python performance machine-learning

我实际上正在解决在实际采访之前提交给我的小编程测试。

我必须实际删除有关问题的信息,您可以在下面的链接中轻松找到它。

所以我尝试了几种直观的方法,或多或少的成功。 在一些研究中,我在GIT(https://github.com/miracode/Machine-Works)上找到了一个例子,其中正在使用一些节点。 我决定将它实现到我的脚本中来测试它。 事实证明它比我的快得多,但仍然无法处理整个输入集。这是一个25MB的txt文件,有54种不同的案例,其中一些每个TestCases有10 000多台机器。 我在其他GIT解决方案中找到了同样的解决方案(也只有这一个)。

因此,当我运行自己的脚本时,我可以理解它会在结束大输入测试之前崩溃我的PC,但是从GIT获取解决方案并且无法计算测试输入是非常令人惊讶的。

我的计算机上有16GB的RAM,我从未看到它像那样崩溃,即使在处理更大的数据集时也是如此。

以下是我的解决方案实施的副本:

from load_input2 import load as load
import time
"""Third version of project """
""" Implementing decision object, inspired from GIT-found script """

PATH = 'input2.txt'


class TestCase(object):
    def __init__(self, C, D, machines=[]):
        self.budget = C
        self.days = D
        self.machines = sorted([Machine(i[0], i[1], i[2], i[3])
                         for i in machines], key = lambda x : x.day)

    def run(self):
        choice = Decision()
        (choice.machine, choice.budget, choice.day) = (None, self.budget, 0)

        choices = [choice, ]

        for machine in self.machines:

            next_choice = []
            for choice in choices:
                choice.to_buy, choice.not_buy = Decision(), Decision()
                choice.to_buy.day, choice.not_buy.day = machine.day, machine.day
                potential_budget = choice.budget + choice.machine.p_sell + choice.machine.daily_profit * \
                    (machine.day - choice.day -
                     1) if choice.machine else choice.budget

                if machine.p_buy <= potential_budget:

                    choice.to_buy.budget = potential_budget - machine.p_buy
                    choice.to_buy.machine = machine
                    next_choice.append(choice.to_buy)

                choice.not_buy.machine = choice.machine

                try:
                    choice.not_buy.budget = choice.budget + \
                        choice.machine.daily_profit * \
                        (machine.day - choice.day)
                except AttributeError:
                    choice.not_buy.budget = choice.budget
                next_choice.append(choice.not_buy)

            choices = next_choice


        results = []
        for choice in choices:
            try:
                results.append(choice.budget +
                               choice.machine.daily_profit *
                               (self.days -
                                choice.day) +
                               choice.machine.p_sell)
            except AttributeError:
                results.append(choice.budget)
        return(max(results))


class Machine(object):
    def __init__(self, day, p_buy, p_sell, daily_profit):
        self.p_buy, self.p_sell = p_buy, p_sell
        self.day, self.daily_profit = day, daily_profit


class Decision(object):
    def __init__(self):
        self.to_buy, self.not_buy = None, None
        self.machine, self.budget = None, None
        self.day = None


def main():
    start = time.time()
    global PATH
    testcases = load(PATH)
    count = 1
    for (case_data, data) in testcases:
        machines = [i for i in data]
        dolls = TestCase(case_data[2], case_data[3], machines).run()
        print(
            "Case {}: {}".format(case_data[0], dolls))
    print("Effectue en  ", start - time.time())


if __name__ == '__main__':
    main()

Load_input2.py:

def load(path):
    with open(path) as fil:
        inp = fil.read().split('\n')  # Opening the input file
    testcases, results = {}, {}
    count = 1
    for line in inp:  # Splitting it and getting results for each TestCase
        split = [int(i) for i in line.split()]
        if len(split) == 3:
            case = tuple([count]+split)
            testcases[case] = []
            count+=1
        else:
            if len(split) > 0:
                testcases[case].append(split)
    sort = sorted([(case,data) for case,data in testcases.items()] , key = lambda x : x[0][0])
    #print([i[0] for i in sort])
    return(sort)

如果您有任何建议或暗示,我会帮助他们!

我真的不想要一个准备好的粘贴解决方案,因为这是一个面试问题,我希望它真诚地反映我的能力,即使我确实在我的能力中包括在惊人的社区中进行搜索;)< / p>

感谢关心!

编辑:整个输入测试集在此处可用:https://gitlab.com/InfoCode/Coding_Problems/raw/master/MachineWork/input.txt

编辑:我使用的原始脚本,当然不是非最佳的,但计算量要少得多,我相信真正的大型测试用例 过程是不同的,在开头解释

""" First version of the project"""
""" Using a day-to-day approach to estimate best behavior"""
""" On each day, this algorithm will complete :"""
""" - Looking for each machine to be bought on this day and taking the more profitable one in long-term run"""
""" - During all depreciation period (time required for the machine to be cost-effective), checking if the purchase of the machine won't interfer with some more profitable machine"""
""" - Buying the machine and moving along to next day"""
""" This algorithm allow a faster execution for input with large sets of machines to be sold"""

""" Cannot yet found how to prevent him from choosing the machine 2 in case (6,10,20) which leads to a decrease of 1 dollar in profits"""

PATH = 'input2.txt'

# Defining the TestCase class which is used for iterating through the days


class TestCase(object):
    def __init__(self, C, D, machines=[]):
        self.budget = C
        self.days = D
        self.machines = [Machine(self, i[0], i[1], i[2], i[3])
                         for i in machines]
        self.choices = []

    # Main function for running the iteration through the days
    def run_case(self):
        for i in range(1, self.days + 1):
            best = self.best_machine_on_day(i)
            if (best is not None and self.should_buy(best[0], i)):
                self.choices.append(best)
        if len(self.choices) > 0:
            self.choices[-1][0].buy_sell(self, self.days + 1, sell=True)
        return(self.budget)

    # Function to define the best machine on a specific day
    def best_machine_on_day(self, n):
        results = []
        for machine in self.machines:
            if n == machine.day:
                results.append(machine.day_based_potential(self, n))
        if len(results) == 0:
            return(None)
        elif len(results) == 1:
            return(results[0])
        else:
            return(max(results, key=lambda x: x[2] * (self.days - n) - x[1]))

    # To define rather an individual should buy or not a machine having a
    # small look on the day aheads
    def should_buy(self, machine, n):
        potential_budget = self.budget + self.choices[-1][0].p_sell + self.choices[-1][0].daily_profit * (
            n - self.choices[-1][0].day - 1) if len(self.choices) > 0 else self.budget
        day_to_cover_cost = int(
            machine.cost / machine.daily_profit) if machine.cost % machine.daily_profit != 0 else machine.cost / machine.daily_profit - 1
        for day in range(day_to_cover_cost):
            next_day = self.best_machine_on_day(n + day + 1)
            if next_day is not None:
                day_to_buy = next_day[0].day
                if (
                    machine.earnings_from_day(
                        self, day_to_buy) < next_day[0].earnings_from_day(
                        self, day_to_buy) or machine.cost >= machine.daily_profit * (
                        next_day[0].day - machine.day)) and next_day[0].p_buy <= potential_budget:
                    return(False)
        if (potential_budget >= machine.p_buy and machine.earnings_from_day(
                self, n) >= machine.p_buy):
            if len(self.choices) > 0:
                self.choices[-1][0].buy_sell(self, n, sell=True)
            machine.buy_sell(self, n)
            return(True)
        else:
            return(False)

# Defining the machine object


class Machine(object):
    def __init__(self, case, day, p_buy, p_sell, daily_profit):
        self.cost = p_buy - p_sell
        self.p_buy, self.p_sell = p_buy, p_sell
        self.day = day
        self.daily_profit = daily_profit

    # To compute the earnings from a starting day n to the end
    def earnings_from_day(self, case, n):
        if self.day <= n <= case.days:
            return((case.days - n) * self.daily_profit - self.cost)
        else:
            return(0)
    # Represent itself method

    def day_based_potential(self, case, n):
        return((self, self.cost, self.daily_profit))
    # Actions on Budget

    def buy_sell(self, case, n, sell=False):
        if sell:
            case.budget += self.p_sell + self.daily_profit * (n - self.day - 1)
        else:
            case.budget -= self.p_buy


def main():
    global PATH
    testcases = load(PATH)
    count = 1
    for case_data, data in testcases.items():
        machines = [i for i in data]
        dolls = TestCase(case_data[1], case_data[2], machines).run_case()
        print(
            "Case {}: {}".format(count, dolls))
        count += 1


if __name__ == '__main__':
    main()

1 个答案:

答案 0 :(得分:0)

更新:解决方案

我发现这个问题起源于2011年ACM-ICPC世界总决赛(acm国际大学生程序设计竞赛; https://icpc.baylor.edu/worldfinals/problems,问题F)。他们还提供了正确的测试结果。

http://www.csc.kth.se/~austrin/icpc/finals2011solutions.pdf

在我的方法中,我采用了两步法:

  1. 某些预处理适用于一个测试用例中的所有可用机器。在给定所有现有机器的上限启发式的情况下,预处理过度估计每台机器的可承受性。永远不会负担得起的机器会从机组中删除。

  2. 搜索本身遵循从后到前的递归方案。它首先确定最理想的机器(从可用当天到期末产生最高利润的机器)并遵循DFS(深度优先搜索)以找到使用经济实惠的机器到初始预算的路径。由于机器每一步都要重新评估,我们可以在找到解决方案后立即考虑最佳解决方案。

  3. 一旦我在所有测试用例中得出正确的结果,我可以在此发布我的解决方案。

    原始答案

    对于你的任务:似乎被打破,即它不是完全可计算的。您可能需要通过预期计划(以及n天的预先计划窗口)进行定向搜索的启发式方法,以便有效地接近解决方案。

    关于读取整个文件,在保持文件句柄打开的同时使用生成器表达式怎么样? 像这样:

    def as_int_list(line):
        return [int(i) for i in line.strip().split()]
    
    
    def read_test_case(filehandle):
        n, c, d = tuple(as_int_list(fh.readline()))
        m = []
        while len(m) < n:
            m.append(as_int_list(fh.readline()))
        yield (n, c, d, m)
    
    
    if __name__ == '__main__':
        localfile = 'testcases.txt'
    
        no = 0
        with open(localfile, 'r') as fh:
            while no < 5:
                case = read_test_case(fh).next()
                print(case)
                no += 1
    

    请注意,我将要读取的测试用例数量限制为5,但您可以阅读EOFErrorStopIteration(尚未对整个文件进行测试,但是你肯定会发现。)