Problem domain and new approach
Usually crawling engines are "protocol-driven" and open a socket connection on the target host or IP address and port. Once a connection is in place the crawler sends HTTP requests and tries to interpret responses. All these responses are parsed and resources are collected for future access. The resource parsing process is crucial and the crawler tries to collect possible sets of resources by fetching links, scripts, flash components and other significant data.
2. DOM event handling and dispatching
3. Dynamic DOM content extraction
Download the paper in PDF format here.
By subscribing to our early morning news update, you will receive a daily digest of the latest security news published on Help Net Security.
With over 500 issues so far, reading our newsletter every Monday morning will keep you up-to-date with security risks out there.