Tuesday, 18 February 2014

How to remove comments from Html using html agility pack ,C# Dot Net



Description: To Remove unwanted comments, we need to search them using HtmlAgility Filter (function: comment();) and then need to use remove function of node to delete them.
Customized method to delete all comment tags in html document except DOCTYPE comment is as given below.
objHTMLdoc => is HtmlDocument object created using HTMLAgilityPack

public static HtmlDocument RemoveComments(HtmlDocument objHTMLdoc)
        {
            var nodes = objHTMLdoc.DocumentNode.SelectNodes("//comment()");
            if (nodes != null)
            {
                foreach (HtmlNode comment in nodes)
                {
                    if (!comment.InnerText.ToUpper().StartsWith("<!DOCTYPE"))
                        comment.ParentNode.RemoveChild(comment);
                }
            }
            return objHTMLdoc;
        }


No comments:

Post a Comment