Question:
How can I get the HTML from a web page that I loaded in TWebBrowser? I want to clip some web contents?
Answer:
You can use the Document property - it has a lot of interesting properties:
Document.All
Document.bgColor
Document.Body.innerHTML
Document.Body.Style.overflowX
Document.Body.Style.overflowY
Document.Body.Style.zoom
Document.cookie
Document.documentElement.innerHTML
Document.documentElement.innerText
Document.FileSize
Document.Frames
Document.Images
Document.LastModified
Document.Links
Document.Location.Protocol
Document.ParentWindow
Document.ParentWindow.ScrollBy(iX: Integer; iY: Integer)
Document.Selection
Document.Title
Document.URL
of which the Body.innerText will serve our purpose. The only limitation of this solution is that it is giving us the HTML as the web browser displays it - which may be different from what 'View Source' in Internet Explorer would show. If the original HTML file included javascript dynamically generating content like this:
then the above function will show the output 'Hello Visitor' but not the original javascript. You need to take a look at the browser cache to get to the original file or use something other than TWebBrowser.
// tested with Delphi 6, should work in Delphi 5 as well
uses
HTTPApp, MSHTML;
procedure TForm1.WebBrowser1DocumentComplete(Sender: TObject;
const pDisp: IDispatch; var URL: OleVariant);
var
document : IHTMLDocument2;
s : string;
begin
// extract the day's total earnings etc
Document := Webbrowser1.Document as IHTMLDocument2;
s := Document.Body.innerHTML;
// process this string to extract contents
end;