我很想知道这四行是做什么的,两个包含正则表达式:
import urllib.request
import json
url = 'http://www.sentiment140.com/api/bulkClassifyJson'
values = {'data': [{'text': 'I love Titanic.'}, {'text': 'I hate Titanic.'}]}
data = json.dumps(values)
response = urllib.request.urlopen(url, data=data.encode("utf-8"))
page = response.read()
我理解结果会像text.replace(/\W/g, " ")
text.split(/\s+/);
text.filter(v => !!v)
text.reduce((dict, v) => {dict[v] = v in dict ? dict[v] + 1 : 1; return dict}, {});
。但是,有人可以向我详细解释每条线路的用途。
答案 0 :(得分:0)
此代码用于计算文本中的单词。
第一个RegExp .replace(/\W/g, " ")
将非单词字符(不是数字,字母或下划线)转换为空格。
第二个RegExp使用空格序列作为分隔符来分割文本。
由于所有操作都会生成新的字符串/数组/对象,因此您需要将结果存储在变量中,或者将方法链接起来。
var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus tempor risus eu nisl pretium ultrices. Vivamus a malesuada est. Donec fringilla pharetra dolor, vitae mattis lorem pulvinar sit amet. Sed tristique tellus sit amet maximus rhoncus. Vestibulum accumsan quam in ligula finibus fermentum.";
var result = text.replace(/\W/g, " ") // convert all now word characters to spaces
.split(/\s+/) // split with continuous spaces as the delimeter
.filter(v => !!v) // filter falsy values, ie 0 probably in this case
.reduce((dict, v) => {dict[v] = v in dict ? dict[v] + 1 : 1; return dict}, {}); // count the number of times a word appears
console.log(result);

您可以将2个RegExps组合成一个.match(/\w+/g)
- 获取所有单词序列的数组:
var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus tempor risus eu nisl pretium ultrices. Vivamus a malesuada est. Donec fringilla pharetra dolor, vitae mattis lorem pulvinar sit amet. Sed tristique tellus sit amet maximus rhoncus. Vestibulum accumsan quam in ligula finibus fermentum.";
var result = text.match(/\w+/g) // get all word sequences
.filter(v => !!v) // filter falsy values, ie 0 probably in this case
.reduce((dict, v) => {dict[v] = v in dict ? dict[v] + 1 : 1; return dict}, {}); // count the number of times a word appears
console.log(result);