Question

我正在尝试使用python正则表达式从字符串中删除一些看起来像非unicode的字符。这是我的代码：

source.reduce((memo, item) => {
  if (memo.currentRowWidth + item.width <= 4) {
    memo.currentRowWidth += item.width;
    memo.output[memo.output.length -1].push(item);
  }
  else if (item.width < 4) {
    memo.push([]);
    memo.output[memo.output.length -1].push(item);
    memo.currentRowWidth += item.width;
  }
  else {
    // item.width too big, do what you want
  }
}, { currentRowWidth: 0, output: [[]] })

结果如下：

xxx='Juliana Gon\xe7alves Miguel'
t=re.sub('\w*','',xxx)
t

这个\ xe7是我想删除的内容。任何人都可以有任何想法吗？

Answer 1

如果所需的输出是

'Juliana Gonalves Miguel'

那么下面的正则表达式应该可以解决问题。

re.sub('(?![ -~]).', '', xxx)

[ -~]：所有ASCII字符的简短易读版本

(?!)：负面预测

Python正则表达式删除非unicode字符

1 个答案: