Charla: "Scalability Issues in Multimedia Information Retrieval"

Prof. Eduardo Alves do Valle Junior
29 Mayo, 2013 - 12:00
Auditorio DCC, Piso 3

Eduardo Alves do Valle Junior (, Professor at the School of Electrical and Computer Engineering (FEEC) of the State University of Campinas (UNICAMP), in the Department of Computer Engineering and Industrial Automation (DCA). Eduardo Valle has a Ph.D. on Computer Sciences at the University of Cergy-Pontoise, França (UCP, 2008) and a M.Sc. on Computer Sciences at the Federal University of Minas Gerais  (UFMG, 2003). He is interested on the subjects of Databases, Information Retrieval, Classification and Machine Learning, with emphasis on the indexing of high-dimensional data, large-scale multimedia databases and content-based information retrieval (CBIR). He is especially interested on the applications of IT to the preservation and access of the Cultural Heritage.



The Millennium marked a turning point for textual Information Retrieval, a moment when Search Engines and Social Networks changed our relationship to World Wide Web: gigantic corpora of knowledge suddenly felt friendly, accessible and manageable. Ten years later, the same phenomenon is happening for complex non-textual data, including multimedia. The challenge is how to provide intuitive, convenient, fast services for those data, in collections whose size and growing rate is so big, that our intuitions fail to grasp.

Two issues have dominated the scientific discourse when we aim at that goal:  our ability to represent multimedia information in a way that allows answering the high-level queries posed by the users, and our ability to process those queries fast.

 In this talk, I will focus on the latter issue, examining similarity search in high-dimensional spaces, a pivotal operation found a variety of database applications? Including Multimedia Information Retrieval. Similarity search is conceptually very simple: find the objects in the dataset that are similar to the query, i.e., those that are close to the query according to some notion of distance. However, due to the infamous "curse of the dimensionality", performing it fast is challenging from both the theoretical and the practical point-of-view.

I have selected for this talk Hypercurves, my latest research endeavor, which is a distributed technique aimed at hybrid CPU-GPU environments.  Hypercurves' goal is to employ throughput-oriented GPUs to keep answer times optimal, under several load regimens. The parallelization also poses interesting theoretical questions of how much can we optimize the parallelization of approximate k-nearest neighbors, if we relax the equivalence to the sequential algorithm from exact to probabilistic.


Secretaría de Investigación