Question

我有一些N个推文以及这些输入值的函数的相应输出。

我想要的是估计给定输出和输入的数学函数。是否更接近f(N)，f(N^2)，log(N)，N*lolg(N)或2^N的形式。

基本上，我想要做的是估计大O.因此，n是输入数据量，输出是计算时间。所以基本上我想至少知道上面提到的功能性矿井功能更接近。

Answer 1

您可以使用Least Squares方法查找与数据距离最短的函数。

假设您有一些未知函数的样本观察列表，其形式为某些订单对(x,y)或(x,f(x))。您可以使用最小二乘法测量此未知函数与某个已知函数 g 的距离。

distance = 0
for x,y in sample pairs
    distance += ( y - g(x) )^2

只要此距离变小，您的未知函数就会更接近已知函数 g 。

现在如果你想找到最接近的函数（从预定的函数列表中）到你的未知函数，你只需计算每个函数与未知函数的距离。无论哪个距离最小，都与您未知的功能更相似。

请注意，此方法是近似值，并非100％准确，但您可以通过提供更大更全面的样本数据来提高其准确性

以下是Python实现示例：

import math

functions = []

functions.append( lambda n: n )                # y = n
functions.append( lambda n: n*n )              # y = n^2
functions.append( lambda n: math.log(n,2) )    # y = log(n)
functions.append( lambda n: n*math.log(n,2) )  # y = n*log(n)
functions.append( lambda n: 2**n )             # y = 2^n

# creating the sample data the unknown function is n + 10*n*log(n)
pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,200,1) ]

# calculating the distance of each function to the sample data
# and add them to a list
distances = [ ]
for func in functions:
    d = 0
    for n,y in pairs:
        d += (func(n) - y)**2 
    distances.append(d)

# finding the minimum value and printing the index   
print distances.index(min(distances))

输出

这意味着第4个函数最接近我们的样本数据n*log(n)。

请注意，如果我们减少这样的样本数据的大小（去掉一半样本数据）：

pairs = [ (n, n + 10*n*math.log(n,2) ) for n in range(1,100,1) ]

该程序将打印1，这意味着最接近的函数是n ²。这显示了样本数据的重要性。

Answer 2

这是sudomakeinstall2答案的一个小附录。

在大O符号中，你不关心恒定的缩放。因此，不是像在答案中那样测量距离( y - g(x) )^2，而是实际上想要衡量( y - k * g(x) )^2，其中k是最适合的恒定比例。这个k可以直接计算为最小二乘拟合。这是修改后的版本，应该更加健壮：

...
for func in functions:
    #calculate the best k
    numerator = 0
    denominator = 0
    for n,y in pairs:
        numerator += func(n) * y
        denominator += func(n) * func(n)
    k = numerator / denominator
    d = 0
    for n,y in pairs:
        d += (k * func(n) - y)**2 
    distances.append(d)
...

如何通过输入和输出值估算数学函数？

2 个答案: