How Can I Strip Html Tags In C#

June 25, 2024 Post a Comment

Possible Duplicate: How to clean HTML tags using C# What is the best way to strip HTML tags in C#?

Solution 1:

publicstaticstringStripHTML(string htmlString)
  {

     string pattern = @"<(.|\n)*?>";

     return Regex.Replace(htmlString, pattern, string.Empty);
  }

Solution 2:

Take your HTML string or document and parse it with HTML Agility Pack. This will give you a HTMLDocument object that is very similar to a XmlDocument.

You can then use it's methods such as SelectNodes to access those portions of the document that you are interested in.

If you choose to use another approach, be aware that parsing HTML (a non-Regular language) with Regular Expressions is widely regarded as a bad idea.

And regardless of the approach, if you are keeping some markup, use a whitelist approach. This means to remove everything that is not explicitly wanted.

Solution 3:

To guarantee that no HTML tags get through, use: HttpServerUtility.HtmlEncode(string);.

If you want some to get through, you can use this "Whitelist" approach.

Update: There has been some vulnerabilities found in that code; as a Developer from Fog Creek tells us.

(Second link includes code).

Html5 Tech

How Can I Strip Html Tags In C#

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "How Can I Strip Html Tags In C#"