Question

我有一个项目，我需要html代码中的get脚本。

＆＃xA;＆＃xA;

 ＆lt; script＆gt;＆＃xA; （function（）{＆＃xA; ... /更多Code＆＃xA; Level.grade =“2”;＆＃xA;＆＃xA; Level.level =“1”;＆＃xA;＆＃xA; Level.max_line =“5”;＆＃xA;＆＃xA; Level.cozum ='adım12\ndönsağ\nadım13\ndönsol\nadım11';＆＃xA; ... /更多Code＆＃xA; ＆lt; / script＆gt;＆＃xA;

＆＃xA;＆＃xA;

我如何只获得“adım12\ndönsağ\nadım13\ndönsol\nadım11”此代码？

＆＃xA;＆＃xA;

感谢帮助

＆＃xA;

Answer 1

使用Regex执行此操作

首先获取该SCRIPT标记的内容，如

response.css("script").extract_first()

然后使用此正则表达式

(Level\.cozum = )(.*?)(\;)

在此处查看演示https://regex101.com/r/YxHRmR/1

这是代码

import re
regex = r"(Level\.cozum = )(.*?)(\;)"

test_str = ("<script>\n"
    "      (function() {\n"
    "        ... / More Code\n"
    "        Level.grade = \"2\";\n\n"
    "        Level.level = \"1\";\n\n"
    "        Level.max_line = \"5\";\n\n"
    "        Level.cozum = 'adım 12\\ndön sağ\\nadım 13\\ndön sol\\nadım 11'; \n"
    "... / More Code\n"
    "</script>")

matches = re.findall(regex, test_str, re.MULTILINE)

print(matches)

Python Scrapy获取HTML <script>标记

1 个答案: