简单正则表达式API的计算占用大量虚拟内存

时间:2019-04-23 11:06:02

标签: c++ c++11

我正在从事嵌入式cpp项目,并且必须在组件'X'之一中进行正则表达式。因此,我在应用程序中为特定的组件“ X”使用了简单的regcomp / regexec。

组件“ X”将连续的文本缓冲区作为输入,并过滤了一些日志,我使用了regcomp / regexec API的10个简单表达式。为了测试性能,我将设备保持未触动/干扰了12个小时以上,当我检查“ top”命令时,该特定组件的“ VSZ%”约为95%,CPU%约为16%

这是我的问题:

  1. 为什么简单的regcomp / regexec占用约16%的CPU?也许因为输入是连续的文本缓冲区,并且每次都要过滤RegEx约16%?考虑到简单的RegEx API,这是否很高?
  2. “ X”部分占VSZ%的95%可能是什么原因?同样,这是预期的吗?如果是这样,为什么?

下面的伪代码:

#include <regex.h>
class ComponentX
{
  public:
    ComponentX();
    virtual ~ComponentX() throw ();
  private:
    regex_t RegEx1;
    regex_t RegEx2;
    regex_t RegEx3;
    regex_t RegEx4;
    regex_t RegEx5;
    regex_t RegEx6;
    regex_t RegEx7;
    regex_t RegEx8;
    regex_t RegEx9;
    regex_t RegEx10;
  }

// Constructor
ComponentX::ComponentX():
{
  const char* Pattern1 = "^12";
  const char* Pattern2 = "REQUEST$";
  const char* Pattern3 = "^Key$";
  const char* Pattern4 = "client";
  const char* Pattern5 = "Fre*";
  const char* Pattern6 = "Buf{2,5}";
  const char* Pattern7 = "C(b|l)";
  const char* Pattern8 = "^.$";
  const char* Pattern9 = "E(b*|R*)R";
  const char* Pattern10 = "m.t";

  regcomp(&RegEx1, Pattern1, 0);
  regcomp(&RegEx2, Pattern2, 0);
  regcomp(&RegEx3, Pattern3, 0);
  regcomp(&RegEx4, Pattern4, 0);
  regcomp(&RegEx5, Pattern5, 0);
  regcomp(&RegEx6, Pattern6, 0);
  regcomp(&RegEx7, Pattern7, 0);
  regcomp(&RegEx8, Pattern8, 0);
  regcomp(&RegEx9, Pattern9, 0);
  regcomp(&RegEx10, Pattern10, 0);
}

// This method will be invoked continuously
void ComponentX::LogReceived(int logserverSeq, const char* module, long sec, long usec,  int pid, int level, int /* seq */, const char* msg)
{
  char buf[BUFSIZE];
  struct tm* tm = ::localtime(&sec);
  const char* levelStr = "<unknown>";

  bool bRegEx1 = false;
  bool bRegEx2 = false;
  bool bRegEx3 = false;
  bool bRegEx4 = false;
  bool bRegEx5 = false;
  bool bRegEx6 = false;
  bool bRegEx7 = false;
  bool bRegEx8 = false;
  bool bRegEx9 = false;
  bool bRegEx10 = false;

// Prepare the string
  const size_t count = ::snprintf(buf, BUFSIZE, "%d %s %d %02d:%02d:%02d.%03ld %s(%d) %s: %s: %s: %s\n",logserverSeq,Months[tm->tm_mon],tm->tm_mday,tm->tm_hour,tm->tm_min, tm->tm_sec, usec / 1000, module, pid, levelStr,Product.c_str(),SerialNo.c_str(),msg);

// Check for all Regular expressions
  if (0 == regexec(&RegEx1, buf, 0, NULL, 0)) {
    bRegEx1 = true;
  }
  if (0 == regexec(&RegEx2, buf, 0, NULL, 0)) {
    bRegEx2 = true;
  }
  if (0 == regexec(&RegEx3, buf, 0, NULL, 0)) {
    bRegEx3 = true;
  }
  if (0 == regexec(&RegEx4, buf, 0, NULL, 0)) {
    bRegEx4 = true;
  }
  if (0 == regexec(&RegEx5, buf, 0, NULL, 0)) {
    bRegEx5 = true;
  }
  if (0 == regexec(&RegEx6, buf, 0, NULL, 0)) {
    bRegEx6 = true;
  }
  if (0 == regexec(&RegEx7, buf, 0, NULL, 0)) {
    bRegEx7 = true;
  }
  if (0 == regexec(&RegEx8, buf, 0, NULL, 0)) {
    bRegEx8 = true;
  }
  if (0 == regexec(&RegEx9, buf, 0, NULL, 0)) {
    bRegEx9 = true;
  }
  if (0 == regexec(&RegEx10, buf, 0, NULL, 0)) {
    bRegEx10 = true;
  }

// If any one RegEx is passed, notify the callback
  if (bRegEx1 == true || bRegEx2 == true || bRegEx3 == true || bRegEx4 == true ||bRegEx5 == true || bRegEx6 == true || bRegEx7 == true || bRegEx8 == true || bRegEx9 == true || bRegEx10 == true ) {
    if (count > 0) {
      for (auto i = Loggers.begin(); i != Loggers.end(); ++i) {
        (*i)->SendLog(buf, count);
      }
    }
  }
}

0 个答案:

没有答案