有人可以帮助我如何计算(A*B)%C
,其中1<=A,B,C<=10^18
在C ++中,没有big-num,只是一种数学方法。
答案 0 :(得分:6)
脱离我的头顶(未经过广泛测试)
typedef unsigned long long BIG;
BIG mod_multiply( BIG A, BIG B, BIG C )
{
BIG mod_product = 0;
A %= C;
while (A) {
B %= C;
if (A & 1) mod_product = (mod_product + B) % C;
A >>= 1;
B <<= 1;
}
return mod_product;
}
这具有复杂度O(log A)
次迭代。您可以使用条件减法替换大多数%
,以获得更高的性能。
typedef unsigned long long BIG;
BIG mod_multiply( BIG A, BIG B, BIG C )
{
BIG mod_product = 0;
// A %= C; may or may not help performance
B %= C;
while (A) {
if (A & 1) {
mod_product += B;
if (mod_product > C) mod_product -= C;
}
A >>= 1;
B <<= 1;
if (B > C) B -= C;
}
return mod_product;
}
这个版本只有一个长整数模 - 它甚至可能比大块方法更快,这取决于你的处理器如何实现整数模数。
答案 1 :(得分:0)
执行(this)[http://stackoverflow.com/a/14859713/256138]堆栈溢出回答之前:
#include <stdint.h>
#include <tuple>
#include <iostream>
typedef std::tuple< uint32_t, uint32_t > split_t;
split_t split( uint64_t a )
{
static const uint32_t mask = -1;
auto retval = std::make_tuple( mask&a, ( a >> 32 ) );
// std::cout << "(" << std::get<0>(retval) << "," << std::get<1>(retval) << ")\n";
return retval;
}
typedef std::tuple< uint64_t, uint64_t, uint64_t, uint64_t > cross_t;
template<typename Lambda>
cross_t cross( split_t lhs, split_t rhs, Lambda&& op )
{
return std::make_tuple(
op(std::get<0>(lhs), std::get<0>(rhs)),
op(std::get<1>(lhs), std::get<0>(rhs)),
op(std::get<0>(lhs), std::get<1>(rhs)),
op(std::get<1>(lhs), std::get<1>(rhs))
);
}
// c must have high bit unset:
uint64_t a_times_2_k_mod_c( uint64_t a, unsigned k, uint64_t c )
{
a %= c;
for (unsigned i = 0; i < k; ++i)
{
a <<= 1;
a %= c;
}
return a;
}
// c must have about 2 high bits unset:
uint64_t a_times_b_mod_c( uint64_t a, uint64_t b, uint64_t c )
{
// ensure a and b are < c:
a %= c;
b %= c;
auto Z = cross( split(a), split(b), [](uint32_t lhs, uint32_t rhs)->uint64_t {
return (uint64_t)lhs * (uint64_t)rhs;
} );
uint64_t to_the_0;
uint64_t to_the_32_a;
uint64_t to_the_32_b;
uint64_t to_the_64;
std::tie( to_the_0, to_the_32_a, to_the_32_b, to_the_64 ) = Z;
// std::cout << to_the_0 << "+ 2^32 *(" << to_the_32_a << "+" << to_the_32_b << ") + 2^64 * " << to_the_64 << "\n";
// this line is the one that requires 2 high bits in c to be clear
// if you just add 2 of them then do a %c, then add the third and do
// a %c, you can relax the requirement to "one high bit must be unset":
return
(to_the_0
+ a_times_2_k_mod_c(to_the_32_a+to_the_32_b, 32, c) // + will not overflow!
+ a_times_2_k_mod_c(to_the_64, 64, c) )
%c;
}
int main()
{
uint64_t retval = a_times_b_mod_c( 19010000000000000000, 1011000000000000, 1231231231231211 );
std::cout << retval << "\n";
}
这里的想法是将64位整数分成一对32位整数,这些整数可以安全地在64位域中相乘。
我们将a*b
表示为(a_high * 2^32 + a_low) * (b_high * 2^32 + b_low)
,进行4倍乘法(跟踪2^32
因子而不将它们存储在我们的位中),然后注意执行{{1}可以通过此模式的一系列a * 2^k % c
重复来完成:k
。所以我们可以在((a*2 %c) *2%c)...
中取这个64位整数的3到4元素多项式并减少它而不必担心事情。
昂贵的部分是2^32
函数(唯一的循环)。
如果您知道a_times_2_k_mod_c
有多个高位清除,您可以加快速度。
您可以使用减法c
a %= c
两者都不是那么实用。