快速而肮脏的方式来分析您的代码

时间:2008-09-14 11:24:31

标签: c++ performance profiling code-snippets

当您想获得有关特定代码路径的性能数据时,您使用什么方法?

7 个答案:

答案 0 :(得分:11)

这种方法有一些局限性,但我仍然认为非常有用。我会在前面列出限制(我知道)并让任何想要使用它的人自行承担风险。

  1. 原始版本我发布过度报告的递归调用时间(正如答案评论中所指出的那样)。
  2. 它不是线程安全的,在我添加代码以忽略递归之前它不是线程安全的,现在它的线程安全性更低。
  3. 虽然它被称为很多次(数百万)非常有效,但它会对结果产生可测量的影响,因此您测量的范围将比您不需要的时间长。

  4. 当手头的问题无法对我的所有代码进行分析或从我想验证的分析器中获取一些数据时,我使用此类。基本上,它总结了您在特定块中花费的时间,并在程序结束时将其输出到调试流(可通过DbgView查看),包括代码执行的次数(以及平均花费的时间)当然))。

    #pragma once
    #include <tchar.h>
    #include <windows.h>
    #include <sstream>
    #include <boost/noncopyable.hpp>
    
    namespace scope_timer {
        class time_collector : boost::noncopyable {
            __int64 total;
            LARGE_INTEGER start;
            size_t times;
            const TCHAR* name;
    
            double cpu_frequency()
            { // cache the CPU frequency, which doesn't change.
                static double ret = 0; // store as double so devision later on is floating point and not truncating
                if (ret == 0) {
                    LARGE_INTEGER freq;
                    QueryPerformanceFrequency(&freq);
                    ret = static_cast<double>(freq.QuadPart);
                }
                return ret;
            }
            bool in_use;
    
        public:
            time_collector(const TCHAR* n)
                : times(0)
                , name(n)
                , total(0)
                , start(LARGE_INTEGER())
                , in_use(false)
            {
            }
    
            ~time_collector()
            {
                std::basic_ostringstream<TCHAR> msg;
                msg << _T("scope_timer> ") <<  name << _T(" called: ");
    
                double seconds = total / cpu_frequency();
                double average = seconds / times;
    
                msg << times << _T(" times total time: ") << seconds << _T(" seconds  ")
                    << _T(" (avg ") << average <<_T(")\n");
                OutputDebugString(msg.str().c_str());
            }
    
            void add_time(__int64 ticks)
            {
                total += ticks;
                ++times;
                in_use = false;
            }
    
            bool aquire()
            {
                if (in_use)
                    return false;
                in_use = true;
                return true;
            }
        };
    
        class one_time : boost::noncopyable {
            LARGE_INTEGER start;
            time_collector* collector;
        public:
            one_time(time_collector& tc)
            {
                if (tc.aquire()) {
                    collector = &tc;
                    QueryPerformanceCounter(&start);
                }
                else
                    collector = 0;
            }
    
            ~one_time()
            {
                if (collector) {
                    LARGE_INTEGER end;
                    QueryPerformanceCounter(&end);
                    collector->add_time(end.QuadPart - start.QuadPart);
                }
            }
        };
    }
    
    // Usage TIME_THIS_SCOPE(XX); where XX is a C variable name (can begin with a number)
    #define TIME_THIS_SCOPE(name) \
        static scope_timer::time_collector st_time_collector_##name(_T(#name)); \
        scope_timer::one_time st_one_time_##name(st_time_collector_##name)
    

答案 1 :(得分:2)

好吧,我有两个代码片段。在pseudocode他们看起来像(这是一个简化版本,我实际上使用的是QueryPerformanceFrequency):

第一个片段:

Timer timer = new Timer
timer.Start

第二个片段:

timer.Stop
show elapsed time

一些热键功夫,我可以说这段代码从我的CPU中偷了多少时间。

答案 2 :(得分:2)

我通过创建两个类来完成我的个人资料:cProfilecProfileManager

cProfileManager将保留cProfile生成的所有数据。

cProfile具有以下要求:

  • cProfile有一个构造函数,用于初始化当前时间。
  • cProfile有一个解构函数,用于将课程的总时间发送到cProfileManager

要使用这些配置文件类,我首先创建一个cProfileManager的实例。然后,我把代码块,我想要分析,在花括号内。在花括号内,我创建了一个cProfile实例。当代码块结束时,cProfile会将完成代码块所花费的时间发送到cProfileManager

示例代码 这是代码示例(简化):

class cProfile
{
    cProfile()
    {
        TimeStart = GetTime();
    };

    ~cProfile()
    {
        ProfileManager->AddProfile (GetTime() - TimeStart);
    }

    float TimeStart;
}

要使用cProfile,我会这样做:

int main()
{
    printf("Start test");
    {
        cProfile Profile;
        Calculate();
    }
    ProfileManager->OutputData();
}

或者这个:

void foobar()
{
    cProfile ProfileFoobar;

    foo();
    {
        cProfile ProfileBarCheck;
        while (bar())
        {
            cProfile ProfileSpam;
            spam();
        }
    }
}

技术说明

这段代码实际上是对范围,构造函数和解构函数在C++中的工作方式的滥用。 cProfile仅存在于块作用域内(我们要测试的代码块)。程序离开块范围后,cProfile将记录结果。

其他增强功能

  • 您可以将一个字符串参数添加到构造函数中,以便您可以执行以下操作: cProfile配置文件(“复杂计算的配置文件”);

  • 您可以使用宏来使代码看起来更干净(小心不要滥用它。与我们对语言的其他滥用不同,宏在使用时可能会很危险。)

    示例:

    #define START_PROFILE cProfile Profile(); { #define END_PROFILE}

  • cProfileManager可以检查调用代码块的次数。但是你需要一个代码块的标识符。第一个增强功能可以帮助识别块。如果要分析的代码位于循环内(如第二个示例aboe),这可能很有用。您还可以添加代码块执行的平均,最快和最长执行时间。

  • 如果您处于调试模式,请不要忘记添加检查以跳过分析。

答案 3 :(得分:2)

请注意,以下内容均专为Windows编写。

我还有一个我编写的计时器类,用于快速和简单的分析,它使用QueryPerformanceCounter()来获得高精度的时序,但略有不同。我的计时器类不会转储Timer对象超出范围时所经过的时间。相反,它会将经过的时间累积到集合中。我添加了一个静态成员函数Dump(),它创建一个经过时间的表,按时间类别(在Timer的构造函数中指定为字符串)进行排序,并附带一些统计分析,如平均经过时间,标准偏差,最大值和最小值。我还添加了一个Clear()静态成员函数,它清除了集合&amp;让你重新开始。

如何使用Timer类(psudocode):

int CInsertBuffer::Read(char* pBuf)
{
       // TIMER NOTES: Avg Execution Time = ~1 ms
       Timer timer("BufferRead");
       :      :
       return -1;
}

示例输出:

Timer Precision = 418.0095 ps

=== Item               Trials    Ttl Time  Avg Time  Mean Time StdDev    ===
    AddTrade           500       7 ms      14 us     12 us     24 us
    BufferRead         511       1:19.25   0.16 s    621 ns    2.48 s
    BufferWrite        516       511 us    991 ns    482 ns    11 us
    ImportPos Loop     1002      18.62 s   19 ms     77 us     0.51 s
    ImportPosition     2         18.75 s   9.38 s    16.17 s   13.59 s
    Insert             515       4.26 s    8 ms      5 ms      27 ms
    recv               101       18.54 s   0.18 s    2603 ns   1.63 s

文件Timer.inl:

#include <map>
#include "x:\utils\stlext\stringext.h"
#include <iterator>
#include <set>
#include <vector>
#include <numeric>
#include "x:\utils\stlext\algorithmext.h"
#include <math.h>

    class Timer
    {
    public:
        Timer(const char* name)
        {
            label = std::safe_string(name);
            QueryPerformanceCounter(&startTime);
        }

        virtual ~Timer()
        {
            QueryPerformanceCounter(&stopTime);
            __int64 clocks = stopTime.QuadPart-startTime.QuadPart;
            double elapsed = (double)clocks/(double)TimerFreq();
            TimeMap().insert(std::make_pair(label,elapsed));
        };

        static std::string Dump(bool ClipboardAlso=true)
        {
            static const std::string loc = "Timer::Dump";

            if( TimeMap().empty() )
            {
                return "No trials\r\n";
            }

            std::string ret = std::formatstr("\r\n\r\nTimer Precision = %s\r\n\r\n", format_elapsed(1.0/(double)TimerFreq()).c_str());

            // get a list of keys
            typedef std::set<std::string> keyset;
            keyset keys;
            std::transform(TimeMap().begin(), TimeMap().end(), std::inserter(keys, keys.begin()), extract_key());

            size_t maxrows = 0;

            typedef std::vector<std::string> strings;
            strings lines;

            static const size_t tabWidth = 9;

            std::string head = std::formatstr("=== %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s ===", tabWidth*2, tabWidth*2, "Item", tabWidth, tabWidth, "Trials", tabWidth, tabWidth, "Ttl Time", tabWidth, tabWidth, "Avg Time", tabWidth, tabWidth, "Mean Time", tabWidth, tabWidth, "StdDev");
            ret += std::formatstr("\r\n%s\r\n", head.c_str());
            if( ClipboardAlso ) 
                lines.push_back("Item\tTrials\tTtl Time\tAvg Time\tMean Time\tStdDev\r\n");
            // dump the values for each key
            {for( keyset::iterator key = keys.begin(); keys.end() != key; ++key )
            {
                time_type ttl = 0;
                ttl = std::accumulate(TimeMap().begin(), TimeMap().end(), ttl, accum_key(*key));
                size_t num = std::count_if( TimeMap().begin(), TimeMap().end(), match_key(*key));
                if( num > maxrows ) 
                    maxrows = num;
                time_type avg = ttl / num;

                // compute mean
                std::vector<time_type> sortedTimes;
                std::transform_if(TimeMap().begin(), TimeMap().end(), std::inserter(sortedTimes, sortedTimes.begin()), extract_val(), match_key(*key));
                std::sort(sortedTimes.begin(), sortedTimes.end());
                size_t mid = (size_t)floor((double)num/2.0);
                double mean = ( num > 1 && (num % 2) != 0 ) ? (sortedTimes[mid]+sortedTimes[mid+1])/2.0 : sortedTimes[mid];
                // compute variance
                double sum = 0.0;
                if( num > 1 )
                {
                    for( std::vector<time_type>::iterator timeIt = sortedTimes.begin(); sortedTimes.end() != timeIt; ++timeIt )
                        sum += pow(*timeIt-mean,2.0);
                }
                // compute std dev
                double stddev = num > 1 ? sqrt(sum/((double)num-1.0)) : 0.0;

                ret += std::formatstr("    %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s %-*.*s\r\n", tabWidth*2, tabWidth*2, key->c_str(), tabWidth, tabWidth, std::formatstr("%d",num).c_str(), tabWidth, tabWidth, format_elapsed(ttl).c_str(), tabWidth, tabWidth, format_elapsed(avg).c_str(), tabWidth, tabWidth, format_elapsed(mean).c_str(), tabWidth, tabWidth, format_elapsed(stddev).c_str()); 
                if( ClipboardAlso )
                    lines.push_back(std::formatstr("%s\t%s\t%s\t%s\t%s\t%s\r\n", key->c_str(), std::formatstr("%d",num).c_str(), format_elapsed(ttl).c_str(), format_elapsed(avg).c_str(), format_elapsed(mean).c_str(), format_elapsed(stddev).c_str())); 

            }
            }
            ret += std::formatstr("%s\r\n", std::string(head.length(),'=').c_str());

            if( ClipboardAlso )
            {
                // dump header row of data block
                lines.push_back("");
                {
                    std::string s;
                    for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                    {
                        if( key != keys.begin() )
                            s.append("\t");
                        s.append(*key);
                    }
                    s.append("\r\n");
                    lines.push_back(s);
                }

                // blow out the flat map of time values to a seperate vector of times for each key
                typedef std::map<std::string, std::vector<time_type> > nodematrix;
                nodematrix nodes;
                for( Times::iterator time = TimeMap().begin(); time != TimeMap().end(); ++time )
                    nodes[time->first].push_back(time->second);

                // dump each data point
                for( size_t row = 0; row < maxrows; ++row )
                {
                    std::string rowDump;
                    for( keyset::iterator key = keys.begin(); key != keys.end(); ++key )
                    {
                        if( key != keys.begin() )
                            rowDump.append("\t");
                        if( nodes[*key].size() > row )
                            rowDump.append(std::formatstr("%f", nodes[*key][row]));
                    }
                    rowDump.append("\r\n");
                    lines.push_back(rowDump);
                }

                // dump to the clipboard
                std::string dump;
                for( strings::iterator s = lines.begin(); s != lines.end(); ++s )
                {
                    dump.append(*s);
                }

                OpenClipboard(0);
                EmptyClipboard();
                HGLOBAL hg = GlobalAlloc(GMEM_MOVEABLE, dump.length()+1);
                if( hg != 0 )
                {
                    char* buf = (char*)GlobalLock(hg);
                    if( buf != 0 )
                    {
                        std::copy(dump.begin(), dump.end(), buf);
                        buf[dump.length()] = 0;
                        GlobalUnlock(hg);
                        SetClipboardData(CF_TEXT, hg);
                    }
                }
                CloseClipboard();
            }

            return ret;
        }

        static void Reset()
        {
            TimeMap().clear();
        }

        static std::string format_elapsed(double d) 
        {
            if( d < 0.00000001 )
            {
                // show in ps with 4 digits
                return std::formatstr("%0.4f ps", d * 1000000000000.0);
            }
            if( d < 0.00001 )
            {
                // show in ns
                return std::formatstr("%0.0f ns", d * 1000000000.0);
            }
            if( d < 0.001 )
            {
                // show in us
                return std::formatstr("%0.0f us", d * 1000000.0);
            }
            if( d < 0.1 )
            {
                // show in ms
                return std::formatstr("%0.0f ms", d * 1000.0);
            }
            if( d <= 60.0 )
            {
                // show in seconds
                return std::formatstr("%0.2f s", d);
            }
            if( d < 3600.0 )
            {
                // show in min:sec
                return std::formatstr("%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0));
            }
            // show in h:min:sec
            return std::formatstr("%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0));
        }

    private:
        static __int64 TimerFreq()
        {
            static __int64 freq = 0;
            static bool init = false;
            if( !init )
            {
                LARGE_INTEGER li;
                QueryPerformanceFrequency(&li);
                freq = li.QuadPart;
                init = true;
            }
            return freq;
        }
        LARGE_INTEGER startTime, stopTime;
        std::string label;

        typedef std::string key_type;
        typedef double time_type;
        typedef std::multimap<key_type, time_type> Times;
//      static Times times;
        static Times& TimeMap()
        {
            static Times times_;
            return times_;
        }

        struct extract_key : public std::unary_function<Times::value_type, key_type>
        {
            std::string operator()(Times::value_type const & r) const
            {
                return r.first;
            }
        };

        struct extract_val : public std::unary_function<Times::value_type, time_type>
        {
            time_type operator()(Times::value_type const & r) const
            {
                return r.second;
            }
        };
        struct match_key : public std::unary_function<Times::value_type, bool>
        {
            match_key(key_type const & key_) : key(key_) {};
            bool operator()(Times::value_type const & rhs) const
            {
                return key == rhs.first;
            }
        private:
            match_key& operator=(match_key&) { return * this; }
            const key_type key;
        };

        struct accum_key : public std::binary_function<time_type, Times::value_type, time_type>
        {
            accum_key(key_type const & key_) : key(key_), n(0) {};
            time_type operator()(time_type const & v, Times::value_type const & rhs) const
            {
                if( key == rhs.first )
                {
                    ++n;
                    return rhs.second + v;
                }
                return v;
            }
        private:
            accum_key& operator=(accum_key&) { return * this; }
            const Times::key_type key;
            mutable size_t n;
        };
    };

文件stringext.h(提供formatstr()函数):

namespace std
{
    /*  ---

    Formatted Print

        template<class C>
        int strprintf(basic_string<C>* pString, const C* pFmt, ...);

        template<class C>
        int vstrprintf(basic_string<C>* pString, const C* pFmt, va_list args);

    Returns :

        # characters printed to output


    Effects :

        Writes formatted data to a string.  strprintf() works exactly the same as sprintf(); see your
        documentation for sprintf() for details of peration.  vstrprintf() also works the same as sprintf(), 
        but instead of accepting a variable paramater list it accepts a va_list argument.

    Requires :

        pString is a pointer to a basic_string<>

    --- */

    template<class char_type> int vprintf_generic(char_type* buffer, size_t bufferSize, const char_type* format, va_list argptr);

    template<> inline int vprintf_generic<char>(char* buffer, size_t bufferSize, const char* format, va_list argptr)
    {
#       ifdef SECURE_VSPRINTF
        return _vsnprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
#       else
        return _vsnprintf(buffer, bufferSize-1, format, argptr);
#       endif
    }

    template<> inline int vprintf_generic<wchar_t>(wchar_t* buffer, size_t bufferSize, const wchar_t* format, va_list argptr)
    {
#       ifdef SECURE_VSPRINTF
        return _vsnwprintf_s(buffer, bufferSize-1, _TRUNCATE, format, argptr);
#       else
        return _vsnwprintf(buffer, bufferSize-1, format, argptr);
#       endif
    }

    template<class Type, class Traits>
    inline int vstringprintf(basic_string<Type,Traits> & outStr, const Type* format, va_list args)
    {
        // prologue
        static const size_t ChunkSize = 1024;
        size_t curBufSize = 0;
        outStr.erase(); 

        if( !format )
        {
            return 0;
        }

        // keep trying to write the string to an ever-increasing buffer until
        // either we get the string written or we run out of memory
        while( bool cont = true )
        {
            // allocate a local buffer
            curBufSize += ChunkSize;
            std::ref_ptr<Type> localBuffer = new Type[curBufSize];
            if( localBuffer.get() == 0 )
            {
                // we ran out of memory -- nice goin'!
                return -1;
            }
            // format output to local buffer
            int i = vprintf_generic(localBuffer.get(), curBufSize * sizeof(Type), format, args);
            if( -1 == i )
            {
                // the buffer wasn't big enough -- try again
                continue;
            }
            else if( i < 0 )
            {
                // something wierd happened -- bail
                return i;
            }
            // if we get to this point the string was written completely -- stop looping
            outStr.assign(localBuffer.get(),i);
            return i;
        }
        // unreachable code
        return -1;
    };

    // provided for backward-compatibility
    template<class Type, class Traits>
    inline int vstrprintf(basic_string<Type,Traits> * outStr, const Type* format, va_list args)
    {
        return vstringprintf(*outStr, format, args);
    }

    template<class Char, class Traits>
    inline int stringprintf(std::basic_string<Char, Traits> & outString, const Char* format, ...)
    {
        va_list args;
        va_start(args, format);
        int retval = vstringprintf(outString, format, args);
        va_end(args);
        return retval;
    }

    // old function provided for backward-compatibility
    template<class Char, class Traits>
    inline int strprintf(std::basic_string<Char, Traits> * outString, const Char* format, ...)
    {
        va_list args;
        va_start(args, format);
        int retval = vstringprintf(*outString, format, args);
        va_end(args);
        return retval;
    }

    /*  ---

    Inline Formatted Print

        string strprintf(const char* Format, ...);

    Returns :

        Formatted string


    Effects :

        Writes formatted data to a string.  formatstr() works the same as sprintf(); see your
        documentation for sprintf() for details of operation.  

    --- */

    template<class Char>
    inline std::basic_string<Char> formatstr(const Char * format, ...)
    {
        std::string outString;

        va_list args;
        va_start(args, format);
        vstringprintf(outString, format, args);
        va_end(args);
        return outString;
    }
};

文件algorithmext.h(提供transform_if()函数):

/*  ---

Transform
25.2.3

    template<class InputIterator, class OutputIterator, class UnaryOperation, class Predicate>
        OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, UnaryOperation op, Predicate pred)

    template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
        OutputIterator transform_if(InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op, Predicate pred)

Requires:   

    T is of type EqualityComparable (20.1.1) 
    op and binary_op have no side effects

Effects :

    Assigns through every iterator i in the range [result, result + (last1-first1)) a new corresponding value equal to one of:
        1:  op( *(first1 + (i - result)) 
        2:  binary_op( *(first1 + (i - result), *(first2 + (i - result))

Returns :

    result + (last1 - first1)

Complexity :

    At most last1 - first1 applications of op or binary_op

--- */

template<class InputIterator, class OutputIterator, class UnaryFunction, class Predicate>
OutputIterator transform_if(InputIterator first, 
                            InputIterator last, 
                            OutputIterator result, 
                            UnaryFunction f, 
                            Predicate pred)
{
    for (; first != last; ++first)
    {
        if( pred(*first) )
            *result++ = f(*first);
    }
    return result; 
}

template<class InputIterator1, class InputIterator2, class OutputIterator, class BinaryOperation, class Predicate>
OutputIterator transform_if(InputIterator1 first1, 
                            InputIterator1 last1, 
                            InputIterator2 first2, 
                            OutputIterator result, 
                            BinaryOperation binary_op, 
                            Predicate pred)
{
    for (; first1 != last1 ; ++first1, ++first2)
    {
        if( pred(*first1) )
            *result++ = binary_op(*first1,*first2);
    }
    return result;
}

答案 4 :(得分:0)

文章代码分析器和优化有很多关于C ++代码分析的信息,还有一个程序/类的免费下载链接,它将显示不同代码路径/方法的图形表示。

答案 5 :(得分:0)

我有一个快速而简洁的分析类,即使是最紧凑的内部循环也可用于分析。重点是极轻的重量和简单的代码。该类分配一个固定大小的二维数组。然后我在整个地方添加“检查点”调用。当在检查点M之后立即到达检查点N时,我将经过的时间(以微秒为单位)添加到数组项[M,N]。由于这是为了描述紧密循环,我也有“迭代开始”调用,它重置了“最后一个检查点”变量。在测试结束时,dumpResults()调用会生成相互跟随的所有检查点对的列表,以及计算和解释的总时间。

答案 6 :(得分:0)

出于这个原因,我写了一个名为nanotimer的简单跨平台类。目标是尽可能轻量级,以便通过添加太多指令而不影响实际代码性能,从而影响指令缓存。它能够在windows,mac和linux(以及可能是一些unix变体)中获得微秒精度。

基本用法:

plf::timer t;
timer.start();

// stuff

double elapsed = t.get_elapsed_ns(); // Get nanoseconds

start()还会在必要时重新启动计时器。 &#34;暂停&#34;计时器可以通过存储经过的时间来实现,然后在&#34;取消暂停时重新启动计时器。并在下次检查已用时间时添加到存储的结果中。