Problem domain and new approach
Usually crawling engines are "protocol-driven" and open a socket connection on the target host or IP address and port. Once a connection is in place the crawler sends HTTP requests and tries to interpret responses. All these responses are parsed and resources are collected for future access. The resource parsing process is crucial and the crawler tries to collect possible sets of resources by fetching links, scripts, flash components and other significant data.
2. DOM event handling and dispatching
3. Dynamic DOM content extraction
Download the paper in PDF format here.