Dal Alert!

Receive alerts from Dalhousie by text message.


CS Seminar: Stefano Ferilli - Automatic Learning of Linguistic Resources and Models for Text Processing and Document Management

Speaker: Stefano Ferilli (University of Bari, Italy)

Title:   Automatic Learning of Linguistic Resources and Models for Text Processing and Document Management

Natural Language Processing (NLP) is a crucial technology to enable automatic systems to properly understand the content of most documents and properly process and manage them. Two categories of NLP tasks may be identified. "High-level" ones are concerned with the actual understanding and handling of documents (and document collections), in order to effectively support final users in carrying out their activities. "Low-level" ones focus on the pre-processing of texts in order to extract suitable features to be used in high-level ones. While the former are, to some extent, language-independent, the latter are strictly language-dependent, posing the problem of having available different algorithms and resources for different languages. Both categories, due to the size of the data or to the complexity of the task, may benefit from automatic acquisition of models through Machine Learning and Data Mining techniques.

This talk presents an overview of the research on this topic carried out at the LACAM lab of the University of Bari, Italy. It is aimed at the automated acquisition of resources for both low-level tasks (concerning language identification, stopword removal, stemming and concept/relationship extraction/organization), and high-level ones (document classification, reading order detection, keyword extraction, information retrieval, author identification, sentiment analysis). Original contributions are provided in the use of symbolic machine learning and reasoning techniques for most of the above tasks, as a complement to typically statistical approaches proposed in the literature. Most of the findings were embedded in DoMInUS, a system for document processing and digital library management developed at LACAM.

Brief Bio:
Stefano Ferilli got a Ph.D. in Computer Science at the University of Bari in 2001. Since 2006 he is Associate Professor at the University of Bari, where his teaching activity included various fundamental courses in Computer Science (among which Programming, Algorithms and Data Structures, Programming Languages, Artificial Intelligence and Intelligent Agents). He is since 2006 the Head of the Inter-departmental Centre for Logic and its Applications at the University of Bari, and since 2011 a member of the Board of the Italian Association for Artificial Intelligence. He is Associate Editor of Information Sciences (Elsevier) and of Computational Intelligence (Wiley). He served as a chair, organizer or Program Committee member at many international conferences, including the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), the European Conference on Artificial Intelligence (ECAI), the International Joint Conference on Artificial Intelligence (IJCAI), the ACM Symposium on Document Engineering (DocEng), and the European Conference on Digital Libraries (ECDL).

Host:  Evangelos Milios  (eem@cs.dal.ca)



Room 430, Goldberg Computer Science Building