In a recent paper, we did a Yahoo! Image Search for "Comet Holmes". We fetched the thousands of images found, automatically calibrated them, and used the image footprints (in celestial coordinates) to recover the orbital parameters of the comet.
Starting with this example, I will talk about some of the opportunities, issues and challenges around using heterogeneous data (including crowd-sourced or web-trawled data) to answer scientific questions, and some of the technology we are developing to address these challenges. This technology includes the Astrometry.net automated calibration system, and a nascent project, "the Tractor", which reframes the problem of detecting and measuring astronomical objects in images into the framework of inference in a probabilistic generative model. These tools also end up being useful for large-scale time-domain surveys such as the LSST.