Thursday 13 March 2014

How to filter/search tags in html document using htmlAgilityPack

How to filter/search tags in html document using htmlAgilityPack

Ways to filter html elements based on filter criteria. Below given list using different filter criteria assigned to  sFilterCriteria variable which is later used for filtration

1)    string sFilterCriteria="//elementName";
Examples:
·         string sFilterCriteria = "//div";
·         string sFilterCriteria = "//img";
·         string sFilterCriteria = "//a";

      Description: To filter all <elementName> of given Html Document


2)    string sFilterCriteria="//elementName[@AttributeName]";
Examples:
·         string sFilterCriteria = "//div[@id]";
·         string sFilterCriteria = "//img[@alt]";      
·         string sFilterCriteria = "//p[@style]";
    
Description: To filter all <elementName> elements having id attribute of given Html Document

3)    string sFilterCriteria="//elementName[@AttributeName='AttributeValue']";
Examples:
·         string sFilterCriteria = "//div[@id='div1']";
·         string sFilterCriteria = "//a[@href='MrGST']";
·         string sFilterCriteria = "//img[@alt='title']";

 Description: To filter all <elementName> elements having  AttributeName attribute with value AttributeValue of given Html Document

4)    string sFilterCriteria="//*[@AttributeName]";
Examples:
·         string sFilterCriteria = "//*[@id]";
·         string sFilterCriteria = "//*[@href]";
·         string sFilterCriteria = "//*[@face]";
·         string sFilterCriteria = "//*[@alt]";
·         string sFilterCriteria = "//*[@src]";

 Description: To filter all html elements having  AttributeName attribute of given Html Document

5)    string sFilterCriteria="//*[@AttributeName='AttributeValue']";
Examples:
·         string sFilterCriteria = "//*[@id='div1']";
·         string sFilterCriteria = "//*[@href='MrGST']";
·         string sFilterCriteria = "//*[@href='MrGST']";

 Description: To filter all html elements having  AttributeName attribute with value AttributeValue of given Html Document

6)    string sFilterCriteria="//*[@AttributeName='AttributeValue']";
7)   Conditional filtration criteria
Examples:
·         string sFilterCriteria = "//img[@src and (@width or @height)]";
·         string sFilterCriteria = "//span[@lang='EN-US' and @style]";
·         string sFilterCriteria = "//*[contains(@style, 'Wingding')]";

[Note: In last example, contains is used to check if winding is present in style attribute or not]
Running the Filter Criteria for HTMLDocument
Use below code to get list of html elements present in document based of filter criteria sFilterCriteria
Create html document object
HtmlAgilityPack.HtmlDocument htmDoc = new HtmlAgilityPack.HtmlDocument();
htmDoc.LoadHtml(“<html>…………</html>”);

Use Search filter criteria sFilterCriteria
HtmlNodeCollection nc = doc.DocumentNode.SelectNodes(sFilterCriteria);
if (nc != null)
{
  foreach (HtmlNode node in nc)
  {
       //Logic body
  }
}

Example:
string sFilterCriteria = "//a[@target]";
HtmlNodeCollection nc = doc.DocumentNode.SelectNodes(sFilterCriteria);
if (nc != null)
{
  foreach (HtmlNode node in nc)
  {
       //Logic body
  }
}