Question

我有一个项目要做，我们要解决x的矩阵方程AX = B，假设A是三对角矩阵。我在C ++中完成了这个项目，得到了生成正确的Matrix X的程序，但是当试图将错误报告回用户A*X-B时，我得到了一个错误的错误!!这是因为我正在减去A*X和B，其条目彼此任意接近。我有两个关于如何处理这个问题的想法，逐个元素：

根据这篇文章http://en.wikipedia.org/wiki/Loss_of_significance，在直接减法-log2(1-y/x)中可能会丢失x-y位。让我们按x缩放y和pow(2,bitsLost)，减去两者，然后再除以pow(2,bitsLost)
在数值方法课程中非常强调这是为了：取算术共轭！而不是double difference = x-y;使用double difference = (x*x-y*y)/(x+y);

好的，那么为什么你没有选择一种方法并继续前进？

我在这里尝试了所有三种方法（包括直接减法）：http://ideone.com/wfkEUp。我想知道两件事：

在“缩放和除垢”方法（我故意选择2的幂）和算术共轭方法之间，哪一个产生较少的误差（就减去大数而言）？
哪种方法在计算上更有效？ /*For this, I was going to say the scaling method was going to be more efficient with a linear complexity versus the seemed quadratic complexity of the conjugate method, but I don't know the complexity of log2()*/

欢迎任何和所有帮助!!

P.S。：所有三种方法似乎都在示例代码中返回相同的double ...

让我们看看你的一些代码 没问题;这是我的Matrix.cpp代码

#include "ExceptionType.h"
#include "Matrix.h"
#include "MatrixArithmeticException.h"
#include <iomanip>
#include <iostream>
#include <vector>

Matrix::Matrix()
{
    //default size for Matrix is 1 row and 1 column, whose entry is 0
    std::vector<long double> rowVector(1,0);
    this->matrixData.assign(1, rowVector);
}

Matrix::Matrix(const std::vector<std::vector<long double> >& data)
{
    this->matrixData = data;
    //validate matrixData
    validateData();
}

//getter functions
//Recall that matrixData is a vector of a vector, whose elements should be accessed like matrixData[row][column].
//Each rowVector should have the same size.
unsigned Matrix::getRowCount() const { return matrixData.size(); }

unsigned Matrix::getColumnCount() const { return matrixData[0].size(); }

//matrix validator should just append zeroes into row vectors that are of smaller dimension than they should be...
void Matrix::validateData()
{
    //fetch the size of the largest-dimension rowVector
    unsigned largestSize = 0;
    for (unsigned i = 0; i < getRowCount(); i++)
    {
        if (largestSize < matrixData[i].size())
            largestSize = matrixData[i].size();
    }
    //make sure that all rowVectors are of that dimension
    for (unsigned i = 0; i < getRowCount(); i++)
    {
        //if we find a rowVector where this isn't the case
        if (matrixData[i].size() < largestSize)
        {
            //add zeroes to it so that it becomes the case
            matrixData[i].insert(matrixData[i].end(), largestSize-matrixData[i].size(), 0);
        }
    }

}
//operators
//+ and - operators should check to see if the size of the first matrix is exactly the same size as that of the second matrix
Matrix Matrix::operator+(const Matrix& B)
{
    //if the sizes coincide
    if ((getRowCount() == B.getRowCount()) && (getColumnCount() == B.getColumnCount()))
    {
        //declare the matrixData
        std::vector<std::vector<long double> > summedData = B.matrixData;    //since we are in the scope of the Matrix, we can access private data members
        for (unsigned i = 0; i < getRowCount(); i++)
        {
            for (unsigned j = 0; j < getColumnCount(); j++)
            {
                summedData[i][j] += matrixData[i][j];   //add the elements together
            }
        }
        //return result Matrix
        return Matrix(summedData);
    }
    else
        throw MatrixArithmeticException(DIFFERENT_DIMENSIONS);
}

Matrix Matrix::operator-(const Matrix& B)
{
    //declare negativeB
    Matrix negativeB = B;
    //negate all entries
    for (unsigned i = 0; i < negativeB.getRowCount(); i++)
    {
        for (unsigned j = 0; j < negativeB.getColumnCount(); j++)
        {
            negativeB.matrixData[i][j] = 0-negativeB.matrixData[i][j];
        }
    }
    //simply add the negativeB
    try
    {
        return ((*this)+negativeB);
    }
    catch (MatrixArithmeticException& mistake)
    {
        //should exit or do something similar
        std::cout << mistake.what() << std::endl;
    }
}

Matrix Matrix::operator*(const Matrix& B)
{
    //the columnCount of the left operand must be equal to the rowCount of the right operand
    if (getColumnCount() == B.getRowCount())
    {
        //if it is, declare data with getRowCount() rows and B.getColumnCount() columns
        std::vector<long double> zeroVector(B.getColumnCount(), 0);
        std::vector<std::vector<long double> > data(getRowCount(), zeroVector);
        for (unsigned i = 0; i < getRowCount(); i++)
        {
            for (unsigned j = 0; j < B.getColumnCount(); j++)
            {
                long double sum = 0; //set sum to zero
                for (unsigned k = 0; k < getColumnCount(); k++)
                {
                    //add the product of matrixData[i][k] and B.matrixData[k][j] to sum
                    sum += (matrixData[i][k]*B.matrixData[k][j]);
                }
                data[i][j] = sum;   //assign the sum to data
            }
        }
        return Matrix(data);
    }
    else
    {
        throw MatrixArithmeticException(ROW_COLUMN_MISMATCH); //dimension mismatch
    }
}

std::ostream& operator<<(std::ostream& outputStream, const Matrix& theMatrix)
{
    //Here, you should use the << again, just like you would for ANYTHING ELSE.
    //first, print a newline
    outputStream << "\n";
    //setting precision (optional)
    outputStream.precision(11);
    for (unsigned i = 0; i < theMatrix.getRowCount(); i++)
    {
        //print '['
        outputStream << "[";
        //format stream(optional)
        for (unsigned j = 0; j < theMatrix.getColumnCount(); j++)
        {
            //print numbers
            outputStream << std::setw(17) << theMatrix.matrixData[i][j];
            //print ", "
            if (j < theMatrix.getColumnCount() - 1)
                outputStream << ", ";
        }
        //print ']'
        outputStream << "]\n";
    }
    return outputStream;
}

Answer 1

您计算了两个数字x和y，它们是有限精度浮点类型。这意味着它们以某种方式已经舍入，这意味着在计算结果时会损失精度。如果您之后减去这些数字，则计算这两个已经舍入的数字之间的差异。

您编写的公式为计算差异提供了最大误差，但此错误与存储的中间结果x和y有关（同样：舍入）。除了x-y之外，没有其他方法可以为您提供“更好”的结果（就完整的计算而言，不仅仅是差异）。简而言之：使用除<{1}}之外的任何 foruma，差异可能更准确。

我建议您查看任意精度算术数学库，例如GMP或Eigen。使用这些库来计算方程式系统。 不要将x-y用于矩阵计算。这样，您可以确保中间结果double和x（或矩阵y和Ax）的精确度与您希望的一样精确< / em>，例如512位，对于大多数情况来说肯定是足够的。

Answer 2

有限精度浮点数据类型不能代表所有可能的实际值。存在无数个不同的值，因此很容易看出并非所有值都可以在有限大小的类型中表示。

因此，您的真正解决方案将是一个不可表示的价值，这是完全可信的。没有多少技巧可以为您提供有限数据类型的精确解决方案。

您需要重新校准您的期望，以匹配有限精度浮点数据类型的实际情况。起点是What Every Computer Scientist Should Know About Floating-Point Arithmetic。

Answer 3

对所有回答这个问题的人说：我知道，并且偶然发现，所有可能double的集合的基数是有限的。我想我别无选择，只能尝试更高精度的数字，或者创建我自己的代表HugeDecimal的类。

Answer 4

通过检查大于某个给定epsilon的差异来替换等式（具有最小可区分差异的常数）。

Answer 5

您不能指望浮点数具有无限精度。您应该考虑需要什么样的精度，然后选择满足您需求的最简单方法。因此，如果你得到相同的结果，那么坚持使用正常的减法，并按照V-X的答案中的建议使用epsilon。

你如何最终得到共轭方法的O（n ^ 2）复杂度？你有一套固定的操作，两个加法，一个减法和一个除法。假设所有三个操作都是O（1），那么你可以得到O（n）来将它应用于n个数字。

Answer 6

虽然这可能无法帮助您选择一种方法，但前段时间我写了一个工具，可以帮助您根据您期望的各种值选择精度：

http://riot.so/floatprecision.html

正如其他答案所说的那样，你不能期望通过浮点获得无限精度，但是你可以使用这样的工具来获得给定数字的最小增量和减量大小，并找出最佳值。精确度用于获得所需的准确度。

处理精确度损失减去两个彼此接近的双精度数

好的，那么为什么你没有选择一种方法并继续前进？

6 个答案: