The
Combine
Harvesting
Robot
|
|
Simulation tool to study focused web crawling
strategies
This is a simulation tool that can be used for
study effects of different URL scheduling strategies for focused
Web-crawling. The tool used the context of a pre-existing database
generated by a relatively broad focused crawl by the Combine crawler. Using this tool will
enable experimentation with different scheduling algorithms locally
without actually doing the crawling over and over again.
The tool itself is writen in Java. For post-processing of statistics a few Matlab routines are provided.
The simulator is the result of a Master's thesis work by Rafael Romero Trujillo.
|
|
|
Last modified 2006-04-21
|