我正在用逗号分隔值加载csv文件。 Buit Tax_Amount列具有特殊字符,它正在替换值。 如何解决这个问题? 我尝试下面的代码。但是没有用。 Tax_Amount Value = SN45000000001 40HX750_SEPT17 STOCK''; :: ?? /?<。
bad_chars = [";:??/?<."]
#df['Tax_Amount'].replace(regex=True, inplace=True, to_replace=r'?', value=r'')
#df['Tax_Amount'] = df['Tax_Amount'].astype(str)
all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)
#df['Tax_Amount'] = translate(None, ''.join(bad_chars))
test_string =df['Tax_Amount']
test_string = filter(lambda i: i not in bad_chars, test_string)
答案 0 :(得分:1)
您可以使用正则表达式从字符串中删除任何字符或模式。这里需要删除的字符放在'[]'之间:
import re
str1 = "SN45000000001 40HX750_SEPT17 STOCK'';:??/?<."
str1 = re.sub('[;:/?<.\'"]', '', str1)
print(str1)
输出:
SN45000000001 40HX750_SEPT17 STOCK
答案 1 :(得分:0)
bad_chars = [";", ":", "?", "<" ,".", "'", '/']
test_string = list(filter(lambda i: i not in bad_chars, Tax_Amount))
print (''.join(test_string))
SN45000000001 40HX750_SEPT17 STOCK
(或)
Tax_Amount = "SN45000000001 40HX750_SEPT17 STOCK'';:??/?<."
bad_chars = [";", ":", "?", "<" ,".", "'", '/']
for k in str(Tax_Amount):
if k in bad_chars:
Tax_Amount=Tax_Amount.replace(k,'')
print(Tax_Amount)
SN45000000001 40HX750_SEPT17 STOCK
答案 2 :(得分:0)
您必须使bad_chars
包含单独的字符列表:
bad_chars = [';',':','?','/','<','.']
test_string = 'N45000000001 40HX750_SEPT17 STOCK'';:??/?<.'
test_string = list(filter(lambda i: i not in bad_chars, test_string))
print(test_string)
这样,您的lambda函数将按预期运行。
答案 3 :(得分:0)
Pandas str允许您替换不需要的字符。这是一个仅用熊猫解决问题的例子
import pandas as pd
df = pd.DataFrame({'Tax_Amount': ['SN45000000001 40HX750_SEPT17 STOCK'';:??/?<.']})
pattern = '[:;\?\.<\'/]' # I use \ to ignore characters that are used in regex :)
df['Tax_Amount_Clean'] = df['Tax_Amount'].str.replace(pattern, '').str.strip()
print(df)
结果:
说明
pattern = '[:;\?\.<\'/]'
我们要让正则表达式查看[...]
中的所有值。但是我们知道.?
是正则表达式中的保留字符,因此我们传递\.\?' which means treat this as they are and not as reserved key, we ignore also
\’`,因为我们将其用作字符串。如果我们使用“”,那么我们可以直接保留它。