Here is a quick tutorial that shows how a string containing HTML can be parsed and navigated using HtmlDocument object.
You must use this namespace for HtmlBrowser and HtmlDocument objects
namespace System.Windows.Forms
First lets say you have a flat HTML in a string variable like this
string strHTML = "[some raw html]";
WebBrowser browser = new WebBrowser();
browser.ScriptErrorsSuppressed = true;
HtmlDocument htmlDocument = browser.Document.OpenNew(true);
htmlDocument.Write(strHTML);
I recommend to set ScriptErrorsSuppressed=true; to avoid possible JS problems while loading HTML.
Once you HtmlDocument object is ready you have these functions (similar to JavaScript) on your disposal:
htmlDocument.GetElementById(string id)
htmlDocument.GetElementsByTagName(string tagName)
htmlDocument.GetElementFromPoint(System.Drawing.Point point)
All these methods returns ether HtmlElement or HtmlElementCollection and here are useful methods for parsing thru elements
htmlElement.Parent
htmlElement.NextSibling
htmlElement.FirstChild
htmlElement.InnerHtml
htmlElement.InnerText
htmlElement.Children
htmlElement.GetElementsByTagName(string tagName)
As you can see this is exactly same as JavaScript DOM model so anybody that has experience with working with DOM will be right at home.
It would be nice to have something like JQuery server side to parse the document, if you know about a better way of parsing or a library dedicated to it fell free to add a comment?
Subscribe to:
Post Comments (Atom)
3 comments:
I don't know, if it is different in ASP.NET. I tried this solution in a standard c# application in .NET 2.0. But browser.Document is null, so you can't call "OpenNew"-Method. I found a workaround with setting the Url of "browser" to "about:blank". This will create all necessary attributes to get the browser.Document for overwriting with your string.
WebBrowser wb = new WebBrowser();
wb.ScriptErrorsSuppressed = true;
wb.Url = new Uri("about:blank");
HtmlDocument doc = wb.Document.OpenNew(true);
doc.Write(myString);
I tried this using ASP.NET MVC 2.0 . But couldn't find WebBrowser, HtmlDocument classes. Pls help me?
@Deegii -
"You must use this namespace for HtmlBrowser and HtmlDocument objects
namespace System.Windows.Forms"
Though HtmlBrowser is a typo, and should be WebBrowser.
Post a Comment