正则表达式无法识别反斜杠

时间:2019-05-03 10:19:47

标签: python regex pandas

我有一个Pandas系列,需要清除反斜杠(terms3,包含成千上万个类似的记录):

terms3[1] = 'blue-eyed soul\' \'pop rock\' \'blues-rock\' \'beach music\' \'soft rock\' \'soul\' \'classic rock\' \'oldies\' \'power pop\' \'psychedelic rock\' \'rock\' \'sunshine pop\' \'blues\' \'singer-songwriter\' \'pop\' \'united states\' \'male vocalist\' "rock \'n roll" \'60s\' \'am pop\' \'r&b\' \'american\' \'male\' \'psychedelic\' \'classic\' \'vocal\' \'americana\' \'game music\' \'mod\' \'trippy\' \'french\' \'germany\' \'canada\' \'70s\' \'belgium\' \'cover\' \'nederland\' \'confident'

如果我输入type(terms3 [1]),我会得到str

此代码有效:

import re

regex = r"\\"

test_str = "'blue-eyed soul\\' \\'pop rock\\' \\'blues-rock\\' \\'beach music\\' \\'soft rock\\' \\'soul\\' \\'classic rock\\' \\'oldies\\' \\'power pop\\' \\'psychedelic rock\\' \\'rock\\' \\'sunshine pop\\' \\'blues\\' \\'singer-songwriter\\' \\'pop\\' \\'united states\\' \\'male vocalist\\' \"rock \\'n roll\" \\'60s\\' \\'am pop\\' \\'r&b\\' \\'american\\' \\'male\\' \\'psychedelic\\' \\'classic\\' \\'vocal\\' \\'americana\\' \\'game music\\' \\'mod\\' \\'trippy\\' \\'french\\' \\'germany\\' \\'canada\\' \\'70s\\' \\'belgium\\' \\'cover\\' \\'nederland\\' \\'confident'"

#test_str = terms3[1]

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

但是,如果我取消注释#test_str = terms3[1]并运行代码,它将不会返回任何内容。甚至认为test_str是term3 [1]的副本。

有什么办法可以解决这个问题吗?

0 个答案:

没有答案