TidyBufFree上的HTML Tidy段错误

时间:2013-12-16 11:49:10

标签: c++ html c xml htmltidy

我正在使用Tidy清理大量HTML。我正在使用的功能是:

std::string cleanHTML (std::string htmlcontent)
{

char* outputstr;
TidyBuffer output ={0};
uint buflen =0;

TidyBuffer errbuf;
int rc = -1;
Bool ok;
TidyDoc tdoc = tidyCreate();                     // Initialize "document"

tidyBufInit( &errbuf );

ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes );  // Convert to XHTML
if ( ok )
  rc = tidySetErrorBuffer( tdoc, &errbuf );      // Capture diagnostics
if ( rc >= 0 )
  rc = tidyParseString( tdoc, htmlcontent.c_str() );           // Parse the input
if ( rc >= 0 )
  rc = tidySaveBuffer (tdoc,&output );               // Tidy it up!


uint yy= output.size;
outputstr = (char*)malloc(yy+10);
uint xx=yy+10;
rc = tidySaveString (tdoc,outputstr,&xx);
std::string cleanedhtml (outputstr);

tidyBufFree(&output);
tidyBufFree(&errbuf);
tidyRelease(tdoc);

return cleanedhtml;

}

程序似乎在使用gdb的某个调用上(我认为调用没有任何明显的区别)tidyBufFree(& output)上的段错误。此函数似乎也存在内存泄漏。

有人可以帮忙吗?

编辑:

我按照建议使用了Valgrind,输出如下(有人可以解释一下它是什么意思吗?)。

==7860== Process terminating with default action of signal 11 (SIGSEGV)
==7860==  Access not within mapped region at address 0x0
==7860==    at 0x428B00: tidyBufFree (in /home/sergerold/qt5_episode_analyser/a.out)
==7860==    by 0x405EC6: cleanHTML(std::string) (in    /home/sergerold/qt5_episode_analyser/a.out)
==7860==    by 0x4048A3: get_tvseries(std::string) (in /home/sergerold/qt5_episode_analyser/a.out)
==7860==    by 0x403DE2: main (in /home/sergerold/qt5_episode_analyser/a.out)
==7860==  If you believe this happened as a result of a stack
==7860==  overflow in your program's main thread (unlikely but
==7860==  possible), you can try to increase the size of the
==7860==  main thread stack using the --main-stacksize= flag.
==7860==  The main thread stack size used in this run was 8388608.
==7860== 
==7860== HEAP SUMMARY:
==7860==     in use at exit: 2,285,594 bytes in 3,638 blocks
==7860==   total heap usage: 102,543 allocs, 98,905 frees, 137,801,931 bytes allocated
==7860== 
==7860== LEAK SUMMARY:
==7860==    definitely lost: 0 bytes in 0 blocks
==7860==    indirectly lost: 0 bytes in 0 blocks
==7860==      possibly lost: 1,303,686 bytes in 114 blocks
==7860==    still reachable: 981,908 bytes in 3,524 blocks
==7860==         suppressed: 0 bytes in 0 blocks
==7860== Rerun with --leak-check=full to see details of leaked memory
==7860== 
==7860== For counts of detected and suppressed errors, rerun with: -v
==7860== Use --track-origins=yes to see where uninitialised values come from
==7860== ERROR SUMMARY: 113 errors from 17 contexts (suppressed: 0 from 0)
Segmentation fault

解决:

分段错误是由tidyBufFree(& output)引起的,当& output为空时导致空引用的解除引用。

2 个答案:

答案 0 :(得分:0)

您的代码看起来很像this example,但几乎没有重要区别。

注意在示例中作者没有调用tidyBufInit( &errbuf );这可能是您的内存泄漏。为了安全起见,使用工具进行内存调试,例如valgrind。至于段错误 - 看起来你做的事情是免费输出是正确的(至少根据例子)所以我的猜测是堆栈损坏可能导致问题。 valgrind再次帮助您找到它。

答案 1 :(得分:0)

当&输出为空时,由tidyBufFree(& output)引起分段错误,导致空指针的解除引用。 - user3083672