Skip to content Skip to sidebar Skip to footer

How Can I Strip Html Tags In C#

Possible Duplicate: How to clean HTML tags using C# What is the best way to strip HTML tags in C#?

Solution 1:

publicstaticstringStripHTML(string htmlString)
  {

     string pattern = @"<(.|\n)*?>";

     return Regex.Replace(htmlString, pattern, string.Empty);
  }

Solution 2:

Take your HTML string or document and parse it with HTML Agility Pack. This will give you a HTMLDocument object that is very similar to a XmlDocument.

You can then use it's methods such as SelectNodes to access those portions of the document that you are interested in.

If you choose to use another approach, be aware that parsing HTML (a non-Regular language) with Regular Expressions is widely regarded as a bad idea.

And regardless of the approach, if you are keeping some markup, use a whitelist approach. This means to remove everything that is not explicitly wanted.

Solution 3:

To guarantee that no HTML tags get through, use: HttpServerUtility.HtmlEncode(string);.

If you want some to get through, you can use this "Whitelist" approach.

Update: There has been some vulnerabilities found in that code; as a Developer from Fog Creek tells us.

(Second link includes code).

Post a Comment for "How Can I Strip Html Tags In C#"