Question

我有一个要求，我需要使用正则表达式从String中提取子字符串。

例如，这是我的示例数据：

           bool TableData::setItem(int row, int col, QtParameter item)
           {
              if(row<rowCount())     
               {                
                    RowOfData rowData = tableData[row];
            /*useless here and impact the tableData[row][col] copy constructer× */            
                    if( col < tableData.at(row).size() )
                    {
                        tableData[row][col] = item;
                    }
                }
            } 
            template <typename T>
            Q_INLINE_TEMPLATE void QList<T>::node_copy(Node *from, Node *to, Node *src)
            {
                Node *current = from;
                if (QTypeInfo<T>::isLarge || QTypeInfo<T>::isStatic) {
                    QT_TRY {
                        while(current != to) {
                            current->v = new T(*reinterpret_cast<T*>(src->v));
                            ++current;
                            ++src;
                        }
                    } QT_CATCH(...) {
                        while (current-- != from)
                            delete reinterpret_cast<T*>(current->v);
                        QT_RETHROW;
                    }
            ...
            }

从此示例数据中，我仅需要提取第二和第四次出现双引号的数据。

我的要求是：Hello, "How" are "you" What "are" you "doing?"

我尝试使用下面的正则表达式，但是我无法按照我的要求提取。

you doing?

Answer 1

我们可以使用re.findall，然后对结果进行切片以获得第一个和第三个匹配项：

import re

string = 'Hello, "How" are "you" What "are" you "doing?"'
result = re.findall('".+?"', string)[1::2]

print(result)

在这里，正则表达式会匹配双引号中包含的任意数量的字符，但会尝试匹配尽可能少的字符（一次非贪婪匹配），否则我们将以单个匹配结尾，"How" are "you" What "are" you "doing?"。

输出：

['"you"', '"doing?"']

如果您希望不带引号的情况下将其合并，则可以将str.strip与str.join一起使用：

print(' '.join(string.strip('"') for string in result))

输出：

you doing?

另一种方法是仅在"上分割：

result = string.split('"')[1::2][1::2]
print(result)

输出：

['you', 'doing?']

之所以起作用，是因为如果用双引号将字符串分隔开，那么输出将如下所示：

第一个双引号之前的所有内容
第一个双引号之后和第二个双引号之前的所有内容
第二个双引号之后和第三个双引号之前的所有内容 ...

这意味着我们可以使用所有偶数元素来获取用引号引起来的元素。然后，我们可以再次对结果进行切片以获得第二和第四结果。

Answer 2

仅Regex解决方案。可能不是100％准确，因为它与第二个匹配项匹配，而不仅仅是第二和第四个匹配项，但是它适用于示例。

"[^"]+"[^"]+("[^"]+")

JS中的演示

var str = 'Hello, "How" are "you" What "are" you "doing?"';
var regex = /"[^"]+"[^"]+("[^"]+")/g
match = regex.exec(str);
while (match != null) {
   // matched text: match[0]
   // match start: match.index
   // capturing group n: match[n]
  console.log(match[1])
  match = regex.exec(str);
}

Answer 3

我们可以尝试使用re.findall提取所有引用的术语。然后，仅使用结果列表中的偶数项构建字符串：

input = "Hello, \"How\" are \"you\" What \"are\" you \"doing?\""
matches = re.findall(r'\"([^"]+)\"', input)
matches = matches[1::2]
output = " ".join(matches)
print(output)

you doing?

使用正则表达式从字符串中提取子字符串

3 个答案: