无法删除特殊字符;:?? /?<

时间:2019-07-15 03:37:28

标签: python python-3.x pandas

我正在用逗号分隔值加载csv文件。 Buit Tax_Amount列具有特殊字符,它正在替换值。 如何解决这个问题? 我尝试下面的代码。但是没有用。 Tax_Amount Value = SN45000000001 40HX750_SEPT17 STOCK''; :: ?? /?<。

bad_chars = [";:??/?<."] 
#df['Tax_Amount'].replace(regex=True, inplace=True, to_replace=r'?', value=r'')
#df['Tax_Amount'] = df['Tax_Amount'].astype(str)
all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)
#df['Tax_Amount'] = translate(None, ''.join(bad_chars)) 
test_string =df['Tax_Amount']
test_string = filter(lambda i: i not in bad_chars, test_string)

4 个答案:

答案 0 :(得分:1)

您可以使用正则表达式从字符串中删除任何字符或模式。这里需要删除的字符放在'[]'之间:

import re
str1 = "SN45000000001 40HX750_SEPT17 STOCK'';:??/?<."
str1 = re.sub('[;:/?<.\'"]', '', str1)
print(str1)

输出:

SN45000000001 40HX750_SEPT17 STOCK

答案 1 :(得分:0)

bad_chars = [";",  ":", "?", "<" ,".", "'", '/'] 
test_string = list(filter(lambda i: i not in bad_chars, Tax_Amount))
print (''.join(test_string))

SN45000000001 40HX750_SEPT17 STOCK

(或)

Tax_Amount = "SN45000000001 40HX750_SEPT17 STOCK'';:??/?<."
bad_chars = [";",  ":", "?", "<" ,".", "'", '/'] 
for k in str(Tax_Amount):
    if k in bad_chars:
        Tax_Amount=Tax_Amount.replace(k,'')

print(Tax_Amount)

SN45000000001 40HX750_SEPT17 STOCK

答案 2 :(得分:0)

您必须使bad_chars包含单独的字符列表:

bad_chars = [';',':','?','/','<','.'] 
test_string = 'N45000000001 40HX750_SEPT17 STOCK'';:??/?<.'
test_string = list(filter(lambda i: i not in bad_chars, test_string))
print(test_string) 

这样,您的lambda函数将按预期运行。

答案 3 :(得分:0)

Pandas str允许您替换不需要的字符。这是一个仅用熊猫解决问题的例子

import pandas as pd

df = pd.DataFrame({'Tax_Amount': ['SN45000000001 40HX750_SEPT17 STOCK'';:??/?<.']})

pattern = '[:;\?\.<\'/]' # I use \ to ignore characters that are used in regex :)

df['Tax_Amount_Clean'] = df['Tax_Amount'].str.replace(pattern, '').str.strip()

print(df)

结果: enter image description here

说明 pattern = '[:;\?\.<\'/]'我们要让正则表达式查看[...]中的所有值。但是我们知道.?是正则表达式中的保留字符,因此我们传递\.\?' which means treat this as they are and not as reserved key, we ignore also \’`,因为我们将其用作字符串。如果我们使用“”,那么我们可以直接保留它。