Question

我知道这是一个重复的问题，但到目前为止，我已经非常努力地尝试了所有解决方案。任何人都可以帮助如何从文件中删除像\ xc3 \ xa2 \ xc2 \ x84 \ xc2 \ xa2这样的字符？

我目前要清理的文件内容是： b＆＃39;烤洋葱蘸酱，＆＃34; b＆＃34;＆＃34; [＆＃39; 2磅大黄洋葱，切成薄片＆＃39;，＆＃3; 3大葱，薄切片＆＃39;，＆＃39; 4枝百里香＆＃39;，＆＃39; 1/4杯橄榄油＆＃39;＆＃39;洁净盐和新鲜黑胡椒＆＃39;，＃1; 1杯白葡萄酒＆＃39; 2汤匙香槟醋＆＃39; 2杯酸奶油＆＃39; 1/2杯切碎的新鲜韭菜＆＃39;，＆＃39; 1 / 4杯普通希腊酸奶＆＃39;，＆＃39;所有调味料和百里香配菜＆＃39; Cape Cod Waves \ xc3 \ xa2 \ xc2 \ x84 \ xc2 \ xa2马铃薯片供应＆＃39; ]＆＃34;＆＃34;＆＃34;

我尝试过使用re.sub（＆＃39; [^ \ x00- \ x7F] +＆＃39;，＆＃39;＆＃39;＆＃39;＆＃39;＆＃39;，whatevertext）但似乎无处可去。我怀疑\这里没有被视为特殊角色。

Answer 1

您可以这样做：

>>> f = open("test.txt","r")
>>> whatevertext = f.read()
>>> print whatevertext
b'Roasted Onion Dip',"b""['2 pounds large yellow onions, thinly sliced', '3 large shallots, thinly sliced', '4 sprigs thyme', '1/4 cup olive oil', 'Kosher salt and freshly ground black pepper', '1 cup white wine', '2 tablespoons champagne vinegar', '2 cups sour cream', '1/2 cup chopped fresh chives', '1/4 cup plain Greek yogurt', 'Everything seasoning and thyme to garnish', 'Cape Cod Waves\xc3\xa2\xc2\x84\xc2\xa2 Potato Chips for serving']"""

>>> import re
>>> result = re.sub('\\\\x[a-f|0-9]+','',whatevertext)
>>> print result
b'Roasted Onion Dip',"b""['2 pounds large yellow onions, thinly sliced', '3 large shallots, thinly sliced', '4 sprigs thyme', '1/4 cup olive oil', 'Kosher salt and freshly ground black pepper', '1 cup white wine', '2 tablespoons champagne vinegar', '2 cups sour cream', '1/2 cup chopped fresh chives', '1/4 cup plain Greek yogurt', 'Everything seasoning and thyme to garnish', 'Cape Cod Waves Potato Chips for serving']"""

>>>

＆＃39; \\×〔A-F | 0-9] +＆＃39;在这个正则表达式中，每个斜杠都用斜杠转义，在x后我们知道可以有0-9的数字或a-f的字母。

从文件

1 个答案: