Skip to content Skip to sidebar Skip to footer

Screen Scraping With Htmlagilitypack And Xpath

[This question has a relative that lives at: Selective screen scraping with HTMLAgilityPack and XPath ] I have some HTML to parse which has general appearance as follow: ...

Solution 1:

Following query selects a element with non-empty href attribute from each cell. If there is no such element, then inner text of cell is used:

var dataList = 
     currentDoc.DocumentNode.Descendants("tr")
               .Select(tr => from td in tr.Descendants("td")
                             let a = td.SelectSingleNode("a[@href!='']")
                             select a == null ? td.InnerText : 
                                                a.Attributes["href"].Value);

Feel free to add ToList() calls.

Post a Comment for "Screen Scraping With Htmlagilitypack And Xpath"