从文本文件中删除所有奇怪的字符

时间:2014-12-11 00:22:50

标签: regex unix batch-file sed terminal

我正在尝试使用sed命令来清理txt文件:

sed -i.bak -e 's@^[A-Za-z0-9_.;,:]+$@@g' *.txt

返回

sed: RE error: illegal byte sequence

我对常规exp做错了什么?通常我会说“将所有不是A-Za-z0-9 _。;,”替换为“”

3 个答案:

答案 0 :(得分:1)

你把^ @放在一个不好的地方,把它放在那里:

sed -i.bak -e 's@[^A-Za-z0-9_\.;,:]\+$@@g' *.txt

而不是一点点变化(反对一些特殊的字符)

答案 1 :(得分:0)

假设您在名为“my_file”的文件中有类似的内容

Location: http://www.google.gy/?gws_rd=cr&ei=l_KIVOXnIsinNq2NgsgB [following]
--2014-12-10 21:25:44--  http://www.google.gy/?gws_rd=cr&ei=l_KIVOXnIsinNq2NgsgB
Resolving www.google.gy (www.google.gy)... 64.233.176.94, 2607:f8b0:4002:c05::5e
Connecting to www.google.gy (www.google.gy)|64.233.176.94|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html.2'

你可以尝试

sed -i.bak -e 's#[^[:alnum:].;,:]##g'  'my_file'

这将找到不是字母数字或“。”或“;”或“,”或“:”的字符并打印。 结果

Location:http:www.google.gygwsrdcreilKIVOXnIsinNq2NgsgBfollowing
2014121021:25:44http:www.google.gygwsrdcreilKIVOXnIsinNq2NgsgB
Resolvingwww.google.gywww.google.gy...64.233.176.94,2607:f8b0:4002:c05::5e
Connectingtowww.google.gywww.google.gy64.233.176.94:80...connected.
HTTPrequestsent,awaitingresponse...200OK
Length:unspecifiedtexthtml
Savingto:index.html.2

答案 2 :(得分:0)

格伦杰克曼是对的,在另一篇文章中找到的解决方案帮助......

唯一的问题是命令现在只知道英语拉丁字符所以不会工作......

这是结果,没有改变:

ÁÉc†ÿ°“Å9,0,sub,,0,0,0,,Pero, aun no comprendo porque quer√≠a acabar conÄC∂u⁄ÁÉx¨†ú°ñÅ996,0,sub,,0,0,0,,õÇ–†µ°ØÅ*10,0,sub,,0,0,0,,Ha deshonrado aléC∂u⁄ÁÉ©≤†”°ÕÅ11,0,sub,,0,0,0,,{\pos(1481.142,795.974)\bord0\fad(800,0)}Himalayan RangeõÇ!¸C∂u@óÁÉf†”°ÕÅ12,0,sub,,0,0,0,,¬øEsta seguro que querer hacerlo solo?, se√±or MitsumazaõÇ»†ª°µÅî13,0,sub,,0,0,0,,Silencio Tatsumi, tranquil√≠zateõÇ2C∂u@ôÁÉ,†©°£Å14,0,sub,,0,0,0,,Pero se√±or...õÇ\†≠°ßÅ\15,0,sub,,0,0,0,,Aunque lo digas...õÇ<†∏°≤Åò16,0,sub,,0,0,0,,Tengo un esp√≠ritu aventureroõÇ|C∂u@£ÁÉ@†∞°™Å17,0,sub,,0,0,0,,Lo entiendo se√±or...õÇ–†≤°¨Å–18,0,sub,,0