Question

我的代码遇到了一些未解决的错误。尝试用'u'而不是'r'进行调整，但仍然会得到相同的错误。从堆栈中尝试了其他解决方案，但没有去任何地方。有什么建议吗？

String body="";
File testDataJsonfile = new File("path/xxx.json");
        JsonNode testJSONNodes = getJsonNodes(testDataJsonfile);
        Map<String, Object> mapWithJSONNodes = convertJSONTOMAP(testJSONNodes);
        body = updateValue(mapWithJSONNodes, "name", "abc")

public String updateValue(Map<String,Object> map ,String key,String value) throws JsonProcessingException{
            map.put(key, value);
            String convertedJSONFile = new ObjectMapper().writeValueAsString(map);
            return convertedJSONFile;
        }

错误讯息：

#use urlib and beautifulsoup to scrpe table 

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import pandas as pd

url = 'https://www.example.com/profiles'

page = urlopen(url).read()
soup = BeautifulSoup(page, 'lxml')
#print(soup)

reEngName = re.compile(r'\[\*\*.+\*\*\]')
reKorName = re.compile(r'\([^\/h]*\)')
reProfile = re.compile(r'\|.+')

for line in re.findall(reEngName, soup):
    print(line)

Answer 1

正则表达式使用字符串。如果要搜索文件的整个原始文本，请将page提供给regex。 Soap是一个解析器，它在内部将html拆分为其语法组件，组织成树，您可以遍历它们。例如，要迭代所有<a>标记：

soup = BeautifulSoup.BeautifulSoup(urllib2.urlopen(url).read())
for a in soup('a'):
    out = doThings(a)

在doThings（a）中：

if a['href'].startswith("http:///www.domain.net"):

当然，在后一阶段，您可以使用正则表达式检查字符串中的匹配项。

正则表达式BeautifulSoup - TypeError：期望的字符串或类字节对象

1 个答案: