If you were to create a robot (spider) to crawl the web, which of the following actions should you be considering doing?
1) Keeping your crawler's raw data and sharing the results pubLiCly.
2) Checking available crawled data from other robots.
3) Announcing your intentions and using HTTP user-agent to identify your robot.
4) You should consider doing all actions mentioned.