Unable to remove a special character from string

时间:2017-08-30 20:40:27

标签: c# regex substring special-characters

Background

There is an application where users are required to enter information that will be stored in a DB. I then have an application that runs every 5 minute and gets the information that was entered by the user using the previous application. My app then grabs all the information from the database and then proceed to do create the given document and then places it in a server for the user to get. However users started having issues with a specific document, where certain functionalities were not executing correctly. So I identified the issue as being the string which a user entered in the entry application, in the title column they had "Jame's Bond Story" so my application creates the document and does not have any issue what so ever. So after debugging I identified the following problem.

Problem

Not sure how the specific user did what he did but the single quote document.getElementById("quizQ") .innerHTML = "<h3>which icon is used for github?</h3><br>" + "<li class='fa'>&#xf296;</li><br>" + "<li class='fa'>&#xf113;</li><br>" + "<li class='fa'>&#xf281</li>" was not really a single quote but some other type of weird character anomaly. I proved this by running the following code to see if I can remove it.

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<header>
</header>
<main>
<div id="quizQ">

</div>
</main>
<footer>
</footer>

However this did not work for me at all. I then broke the string into a character array and instead of getting the character I got a weird digit. So then I proceeded into using this regex code to clean every character and only allow numbers and letters.

payload = "username=;password=;"

s1 = requests.post(
    url, 
    headers={"content-type":"application/x-www-form-urlencoded"},
    data=payload)

This has now become an issue because the users want the Title to contain special the following characters '

I am looking for a way to to filter out any characters including the type I ran into this week and only allow the 6 characters which the users have agreed to. I can up with the following regex formula bu I am getting an empty string.

 string cleanTitle = BookRec.TitleName.Replace("'","");

However I am getting an empty string when I am replacing the title. I am not a big fan of regex, I am also open to a a sub string approach to this as well.

Appended Information

I am not able to access the application that inserts the information to the given database. I am only able to read from the database and then preform actions.

1 个答案:

答案 0 :(得分:2)

你可能想尝试这样的事情:

string cleanTitle = Regex.Replace(BookRec.TitleName, @"[^\u0000-\u007F]+", "");

这将替换不在这些值之间的任何Unicode字符。我不确定那些是否会导致你出现问题,但希望它可能会给你一个正确方向的暗示。