Question

我正在使用一个填充了twitter bios的大型.json，并希望提取screen_names。为了防止搜索也返回生物部分中提到的潜在用户，重要的是只提取每行的第一个匹配。

当我在Notepad ++中打开文件时，我可以使用以下正则表达式来完成：

#include <boost/filesystem.hpp>
#include <iostream>
#include <string>
#include <exception>

namespace fs = boost::filesystem;

int main(int ac, char ** av)
{
  fs::path output(av[1]);

  std::cout << output << std::endl;

  std::cout << output.parent_path() << std::endl;

  if (fs::exists(output))
  {
    std::string msg = output.string() + " already exists";

    throw std::invalid_argument(msg);
  }

  if ( output.parent_path().string().size() != 0 &&
       !fs::exists(output.parent_path()) )
  {
    std::string msg = output.parent_path().string() + " is not a directory";

    throw std::invalid_argument(msg);
  }
}

在python中使用相同的re.findall或re.search不会导致任何匹配。

我对Python和正则表达式都很陌生，所以我很确定我并不完全了解所有必要的编码。

非常感谢提前！

Answer 1

正如其他用户所说，Python和Notepad使用不同的搜索代码，因此为了实现我想要的结果，我部署了以下代码：

  import re
  regex=re.compile(r'"screen_name":\s*"(\w+)"')
  with open("followers.json", "r") as f:
     for line in f:
        output=regex.search(line)
        with open("followers.txt", "a") as outp:
            outp.write(output.group(1)+"\n")

这将分析您指定的.json文件，逐行读取，并保存文件中每行的每个第一个匹配＆＃34; followers.txt＆＃34;。

Notepad ++中使用的正则表达式搜索术语不适用于python

1 个答案: