Dalhousie Natural Language Processing Group (DNLP)

The Dalhousie Natural Language Processing Group (DNLP) provides information about NLP-related research conducted at the University of Dalhousie, and it is a forum for discussion, collaboration, and interaction between researchers interested in the philosophies, theories, and applications related to NLP.

The Dalhousie NLP Group was formed in May 2003 by a combined effort from faculty members and graduate students.

Group Website: http://dnlp.ca
Contact Information: Dr. Vlado Keselj
vlado@cs.dal.ca
Phone: 1-902-494-2893
Fax: 1-902-492-1517 (att. Vlado Keselj)
Research Areas and
Projects:
  • Language modeling, syntactic and semantic analysis, n-grams
  • Information extraction, information retrieval, question answering
  • Text data mining, text categorization, document clustering
  • Speech recognition, automatic translation
  • Computational linguistics, sylabification, multi-word expressions
  • Text messages normalization
  • Sentiment analysis in micro-blogs
  • Computational musicology, music structure analysis
Funding:
Faculty Members:
Other Members:
  • Dr. Axel Soto
Graduate Students and
Research Assistants:

Ph.D. Students:

MCS Students:

Other student members:

  • Ryan Kiros (NSERC USRA)
  • Matthew Butler (visiting Ph.D. University of York, UK)
Seminar Series:

DNLP weekly group meetings:

  • Thurdsays, 1 p.m.
  • Room 311, Goldberg Computer Science Building

Selected Publications

2011:

  • Jane E. Mason, Michael Shepherd, Jack Duffy, and Vlado Keselj. Classifying Web Pages by Genre: Dealing with Unbalanced Distributions, Multiple Labels, and Noise. In Proceedings of WEBIST 2011, 7th International Conference on Web Information Systems and Technologies. Noordwijkerhout, The Netherlands, May 2011.

2010:

  • Connie R. Adsett, Yannick Marchand, "Syllabic Complexity: A Computational Evaluation of Nine European Languages," Journal of Quantitative Linguistics, 17[4]:269-290, 2010.
  • Jacek Wolkowicz and Vlado Keselj. 2010. Predicting Development of Research in Music based on Parallels with Natural Language Processing. fMIR Workshop. In Proceedings of 2010 ISMIR Conference. pp. 665-667. Utrecht, the Netherlands.
  • Haibin Liu, Vlado Keselj, and Christian Blouin. Biological Event Extraction using Subgraph Matching. In Proceedings of the 4th International Symposium on Semantic Mining in Biomedicine (SMBM-2010). Hinxton, Cambridgeshire, UK, October 2010.
  • Atefeh Farzindar and Vlado Keselj (eds.). Advances in Artificial Intelligence, 23rd Canadian Conference on Artificial intelligence, Canadian AI 2010., vol. LNAI 6085, Springer. Ottawa, Canada, May/June 2010.
  • Pif Edwards and Vlado Keselj. MeSH Represented MEDLINE Query Results. In Proceedings of Canadian AI'2010, vol. LNAI 6085 of Lecture Notes in Computer Science, Springer, pages 75-86. Ottawa, ON, Canada, May 2010.
  • Jane E. Mason, Michael Shepherd, Jack Duffy, Vlado Keselj, Carolyn Watters. An n-gram Based Approach to Multi-labeled Web Page Genre Classification. In Proceedings of 43rd Hawaii International Conference on System Sciences. Hawaii, January 2010.
  • Haibin Liu, Christian Blouin, and Vlado Keselj. Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters. Data & Knowledge Engineering, vol. 69, no. 1, pages 137-152, Elsevier Science Publishers, 2010.

2009:

  • Connie R. Adsett and Yannick Marchand. 2009. A Comparison of Data-Driven Automatic Syllabification Methods. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval (SPIRE '09), Jussi Karlgren, Jorma Tarhio, and Heikki Hyyrö (Eds.). Springer-Verlag, Berlin, Heidelberg, 174-181.
  • Connie R. Adsett, Yannick Marchand and Vlado Keselj, "Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian," Computer Speech & Language, 23(4):444-463, 2009.
  • Yannick Marchand, Connie R. Adsett and Robert I. Damper, "Automatic Syllabification in English: A Comparison of Different Algorithms," Language and Speech, 52(1):1-27, 2009.
  • Jacek Wolkowicz, Stephen Brooks and Vlado Keselj. 2009. Midivis: Visualizing Music Structure via Similarity Matrices. In Proceedings of International Computer Music Conference ICMC2009. pp. 53-56. Montreal, Canada.
  • Jacek Wolkowicz, Malcolm Heywood and Vlado Keselj. 2009. Evolving Indirectly Represented Melodies with Corpus-Based Fitness Evaluation. In Proceedings of EvoMusArt 2009 Conference. LNCS vol. 5484, pp. 603-608. Springer, Heidelberg 2009., poster
  • Jacek Wolkowicz, Michael Shepherd, and Vlado Keselj. Wikipedia Search: Combining Language Modeling and Link Analysis. Technical Report CS-2009-01, Faculty of Computer Science, Dalhousie University. Halifax, NS, Canada, January 2009.
  • Hathai Tanta-ngai, Vlado Keselj, and Evangelos E. Milios. Building a Self-organizing Peer-to-peer Network in Shrack - A P2P System for Scientific Document Tracking. APICS 2009 - APICS Mathematics, Statistics and Computer Science Conference, 2009. Extended abstract.
  • Katherine Magee and Vlado Keselj. EulMiner: A Graph-based Interactive Visualization for Biomedical Term Identification. APICS 2009 - APICS Mathematics, Statistics and Computer Science Conference. Dalhousie University, Halifax, Canada, 2009. Poster.
  • Vlado Keselj. Book Review: Speech and Language Processing (second edition) by Daniel Jurafsky and James H. Martin. Computational Linguistics, vol. 35, no. 3, pages 463-466, MIT Press, 2009.
  • Hathai Tanta-ngai, Evangelos E. Milios, Vlado Keselj. Self-organizing Peer-to-Peer Network for Collaborative Document Tracking. In Proceedings of CIKM 2009 Workshop. Complex Networks in Information & Knowledge Management. ACM Eighteenth Conference on Information and Knowledge Management (CIKM'2009). Hongkong, China., November 2009.
  • Vlado Keselj, Haibin Liu, Norbert Zeh, Christian Blouin, Chris Whidden. Finding Optimal Parameters for Edit Distance Based Sequence Classification is NP-Hard. In Proceedings of StReBio'09, KDD-09 Workshop on Statistical Relational Mining and Learning in Bioinformatics . Paris, France, June 2009.
  • Haibin Liu, Christian Blouin, and Vlado Keselj. Identifying Interaction Sentences from Biological Literature Using Automatically Extracted Patterns. In Proceedings of BioNLP 2009, NAACL/HLT 2009 Workshop. Boulder, Colorado, USA, 2009.
  • Matthew Butler and Vlado Keselj. Financial Forecasting using Character N-Gram Analysis and Readability Scores of Annual Reports. In Proceedings of Canadian AI'2009. Kelowna, BC, Canada, May 2009.
  • Pif Edwards and Vlado Keselj. MedicInfoSys: An Architecture for an Evidence-Based Medical Information Research and Delivery System. Technical Report CS-2009-05, Faculty of Computer Science, Dalhousie University. Halifax, NS, Canada, October 2009.

2008:

  • Jacek Wolkowicz, Zbigniew Kulka and Vlado Keselj. 2008. N-gram-based approach to Composer Recognition. Archives of Acoustics vol. 33(2008) issue 1. pp 43-55.
  • Chris Jordan, Jane E. Tougas, John Healy, Vlado Keselj, and Carolyn Watters. Swordfish2: Using Kernel Density Estimation to Smooth Ngram Models for Morphological Analysis. Journal of Interesting Negative Results in Natural Language Processing and Machine Learning, vol. 1, pages 1-18, 2008.
  • Vlado Keselj and Danko Sipka. A Suffix Subsumption-based Approach to Building Stemmers and Lemmatizers for Highly Inflectional Languages with Sparse Resources. INFOTHECA, Journal of Informatics and Librarianship, vol. IX, no. 1-2, pages 23a-33a, 21-31, May 2008.
  • Haibin Liu, Christian Blouin, and Vlado Keselj. An Unsupervised Method for Extracting Domain-specific Affixes in Biological Literature. Technical Report CS-2008-01, Faculty of Computer Science, Dalhousie University. Halifax, NS, Canada, January 2008.
  • Lalita Narupiyakul, Vlado Keselj, Nick Cercone, and Booncharoen Sirinaovakul. Focus to Emphasize Tone Analysis for Prosodic Generation. Computers and Mathematics with Applications, vol. 55, no. 8, pages 1735-1753, Elsevier, 2008.

2007:

  • Connie R. Adsett and Yannick Marchand, "Are Rule-based Syllabification Methods Adequate for Languages with Low Syllabic Complexity? The Case of Italian," SSW6-2007, Bonn, Germany, August 2007, pp.58-63. , poster
  • Yannick Marchand, Connie R. Adsett and Robert I. Damper, "Evaluating Automatic Syllabification Algorithms for English," SSW6-2007, Bonn, Germany, August 2007, pp.316-321. , poster
  • Tony Abou-Assaleh, Chris Whidden, Vlado Keselj, Hathai Tanta-ngai, and Nick Cercone. DalTREC 2007 QA System Jellyfish: Experiments with Integration of Lucene and GATE, and Improved Usage of WordNet and Qrel. In The Fiftheenth Text REtrieval Conference (TREC 2007). Gaithersburg, Maryland, USA, November 2007.
  • Haibin Liu, Christian Blouin, and Vlado Keselj. An Unsupervised Method for Extracting Domain-specific Affixes in Biological Literature. In Proceedings of BioNLP 2007, ACL 2007 Workshop. Prague, Czech Republic, June 29 2007.
  • Hathai Tanta-ngai, Vlado Keselj, Evangelos E. Milios. Shrack: Description and Performance Evaluation of a Peer-to-Peer System for Document Sharing and Tracking using Pull-Only Information Dissemination. In Proceedings of The Fourth International Workshop on HotTopics in Peer-to-Peer Systems (Hot-P2P),Workshop of 21st IEEE International Parallel & DistributedProcessing Symposium. Long Beach, California, USA, March 2007.
  • Hathai Tanta-ngai, Vlado Keselj, and Evangelos E. Milios. Shrack: A Pull-Only Peer-to-Peer Framework for Document Sharing and Tracking. The Annual Dalhousie Computer Science In-House Conference (#dcsi 2007), pages 9-10, 2007. Extended abstract.
  • Haibin Liu, Christian Blouin, Vlado Keselj. A Practical Method for Extracting Prefixes and Suffixes of Biological Terms. The Annual Dalhousie Computer Science In-House Conference (#dcsi 2007), pages 1-2, 2007. Extended abstract.
  • Vlado Keselj and Nick Cercone. A Formal Approach to Subgrammar Extraction for NLP. Mathematical and Computer Modelling, vol. 45, pages 394-403, Elsevier, February 2007.

2006:

  • Vlado Keselj, Tony Abou-Assaleh, and Nick Cercone. DalTREC 2006 QA System Jellyfish: Regular Expressions Mark-and-Match Approach to Question Answering. In The Fourtheenth Text REtrieval Conference (TREC 2006). Gaithersburg, Maryland, USA, November 2006.
  • Syed Sibte Raza Abidi, Peter Bath, and Vlado Keselj (eds.). Advancing Health Information Management and Health Informatics: Issues, Strategies, and Tools; Proceedings of the 11th International Symposium for Health Information Management Research (iSHIMR 2006)., Faculty of Computer Science, Dalhousie University. Halifax, Nova Scotia, Canada, July 2006.
  • Tony Abou-Assaleh, Vlado Keselj, and Nick Cercone. Probabilistic Inference in First-Order Logic Using Relaxed Unification. CAIMS-MITACS 2006 Joint Annual Conference. York University, Toronto, ONPoster presentation; Tony awarded 3rd prize.
  • Haibin Liu and Vlado Keselj. Combined Mining of Web Server Logs and Web Contents for Classifying User Navigation Patterns and Predicting Users' Future Requests. Data & Knowledge Engineering, vol. 61, no. 2, pages 304-330, Elsevier Science Publishers, May 2007. (Published on-line in 2006).
  • Haewon Chung, Vlado Keselj, and Ray Sweidan. A Semi-Structured e-Forms Approach to Deployment of Electronic Clinical Guidelines: A Proposal. In Proceedings of the 11th International Symposium on Health Information Management Research (iSHIMR 2006), pages 289-295, Faculty of Computer Science, Dalhousie University. Halifax, Canada, 2006.
  • Chris Jordan, John Healy, and Vlado Keselj. Swordfish: An Unsupervised Ngram Based Approach to Morphological Analysis. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 657-658, ACM Press. Seattle, Washington, USA, August 2006.
  • Vlado Keselj and Tanja Keselj. What do users want from an on-line dictionary: A seven-years usage study of an e-dictionary. In 9th INTEX/NooJ Conference. Belgrade, Serbia, 2006.

2005:

  • Robert I. Damper, Yannick Marchand and Connie R. Adsett, T. Soonklang and J.-D. S. Marsters, "Multilingual Data-Driven Pronunciation," Proceedings of the 10th International Conference on Speech and Computer (SPECOM 2005), Patras, Greece, October 2005, pp.167-170.
  • Yannick Marchand, Robert I. Damper and Connie R. Adsett, "Computational models of reading aloud: Where are the polysyllabic letter strings?" Proceedings of the 9th International Conference on Cognitive Neuroscience, Havana, Cuba, September 2005, p. 386.
  • Jonathan Doyle and Vlado Keselj. Automatic Categorization of Author Gender via N-Gram Analysis. In The 6th Symposium on Natural Language Processing, SNLP'2005. Chiang Rai, Thailand, December 2005.
  • Lalita Narupiyakul, Nick Cercone, Vlado Keselj, and Booncharoen Sirinaovakul. Focus and Speech Act in Prosodic Analysis for Spoken Language Generation. In The 6th Symposium on Natural Language Processing, SNLP'2005. Chiang Rai, Thailand, December 2005.
  • Vlado Keselj, Evangelos Milios, Andrew Tuttle, Singer Wang, and Roger Zhang. DalTREC 2005 Spam Track: Spam Filtering using N-gram-based Techniques. In The Fourtheenth Text REtrieval Conference (TREC 2005) Proceedings. Gaithersburg, Maryland, USA, November 2005.
  • Tony Abou-Assaleh, Nick Cercone, Jon Doyle, Vlado Keselj, and Chris Whidden. DalTREC 2005 QA System Jellyfish: Mark-and-Match Approach to Question Answering. In The Fourtheenth Text REtrieval Conference (TREC 2005) Proceedings. Gaithersburg, Maryland, USA, November 2005.
  • Haibin Liu and Vlado Keselj. Combined Mining of Web Server Logs and Web Contents. The Annual Dalhousie Computer Science In-House Conference (DCSI 2005), 2005. Extended abstract.
  • Yingbo Miao, Vlado Keselj, and Evangelos Milios. Document Clustering using Character N-grams: A Comparative Evaluation with Term-based and Word-based Clustering. In Proceedings of ACM Fourteenth Conference on Information and Knowledge Management (CIKM 2005). Bremen, Germany, November 2005.
  • Calvin Thomas, Vlado Keselj, Nick Cercone, Kenneth Rockwood, and Elissa Asp. Automatic Detection and Rating of Dementia of Alzheimer Type through Lexical Analysis of Spontaneous Speech. In Proceedings of IEEE ICMA 2005. Niagara Falls, Ontario, Canada, July 2005.
  • Vlado Keselj and Dawn Jutla. QTIP: Multi-Agent NLP and Privacy Architecture for Information Retrieval in Usable Web Privacy Software. In Proceedings of IEEE/WIC/ACM Web Intelligence Conference 2005. Compiegne University of Technology, France, 2005. 18% acceptance rate.
  • Sittichai Jiampojamarn, Nick Cercone, and Vlado Keselj. Biological Named Entity Recognition using N-grams and Classification Methods. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING'05. Meisei University, Hino Campus, Hino-shi, Tokyo, 191-8506 Japan, August 2005.
  • Tony Abou-Assaleh, Nick Cercone, and Vlado Keselj. Question-Answering with Relaxed Unification. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING'05. Meisei University, Hino Campus, Hino-shi, Tokyo, 191-8506 Japan, August 2005.
  • Andrija Tomovic, Predrag Janicic, and Vlado Keselj. n-gram-based Classification and Unsupervised Hierarchical Clustering of Genome Sequences. Computer Methods and Programs in Biomedicine, vol. 81, pages 137-153, February 2006.
  • Keselj, Vlado. Starfish: A Perl-based Framework for Text-Embedded Programming and Preprocessing. The Perl Journal, June 2005.
  • Tony Abou-Assaleh, Nick Cercone, and Vlado Keselj. A Probabilistic Evaluation Function for Relaxed Unification. In The 29th Annual International Computer Software and Applications Conference (IEEE COMPSAC~2005). Edinburgh, Scotland, July 2005.
  • Jiayun Guo, Vlado Keselj, and Qigang Gao. Integrating Web Content Clustering into Web Log Association Rule Mining. In Proceedings of Canadian AI'2005, vol. LNAI 3501 of Lecture Notes in Computer Science, Springer, pages 182-193. Victoria, BC, Canada, May 2005.
  • Yingbo Miao, Vlado Keselj, and Evangelos Milios. Document Clustering using Character N-grams: A Comparative Evaluation with Term-based and Word-based Clustering. Technical Report CS-2005-23, Faculty of Computer Science, Dalhousie University. Halifax, NS, Canada, September 2005.
  • Tomovic, Andrija, Janicic, Predrag, and Keselj, Vlado. N-gram-based Classification and Hierarchical Clustering of Genome Sequences. Technical Report CS-2005-02, Faculty of Computer Science, Dalhousie University. Halifax, NS, Canada, March 2005.

2004:

  • Vlado Keselj, Anthony Cox. DalTREC 2004: Question Answering using Regular Expression Rewriting. In The Thirteenth Text REtrieval Conference (TREC 2004) Proceedings. Gaithersburg, Maryland, USA, November 2004.
  • Anthony Cox, Tony Abou-Assaleh, Wei Ai, Vlado Keselj. Lexical Source-Code Transformation. In Proceedings of the STS'04 Workshop at GPCE/OOPSLA. Vancouver, Canada, October 2004.
  • Tony Abou-Assaleh, Nick Cercone, Vlado Keselj, and Ray Sweidan. Detection of New Malicious Code Using N-gram Signatures. In Proceedings of the Second Annual Conference on Privacy, Security, and Trust (PST'04). Fredericton, New Brunswick, Canada, October 2004.
  • Sittichai Jiampojamarn, Vlado Keselj, and Nick Cercone. Two Experiments in Biological Term Annotation using Classification Methods. In Proceedings of the First Biotechnology and Bioinformatics Symposium (BIOT-04). Colorado Springs, Colorado, USA, September 2004.
  • Vlado Keselj and Nick Cercone. PPDN-A Framework for Peer-to-peer Collaborative Research Network. In Proceedings of The Second International Workshop on Web-based Support Systems (WSS'04), In conjunction with 2004 IEEE/WIC/ACM International Conference on Web Intelligence..
  • Asad Satti, Nick Cercone, and Vlado Keselj. Experiments in Web Page Classification for Semantic Web. In Proceedings of The Second International Workshop on Web-based Support Systems (WSS'04). Beijing, China, September 2004. In conjunction with 2004 IEEE/WIC/ACM International Conference on Web Intelligence..
  • Tony Abou-Assaleh, Nick Cercone, Vlado Keselj, and Ray Sweidan. N-gram-based Detection of New Malicious Code. In The 28th Annual International Computer Software and Applications Conference (IEEE COMPSAC~2004). Hong Kong, China, September 2004.
  • Vlado Keselj, Tanja Keselj, and Larisa Zlatic. R{j}ecnik.com: English-Serbo-Croatian Electronic Dictionary. In Proceedings of COLING'04 Workshop on Enhancing and Using Electronic Dictionaries. Geneve, Switzerland, August 2004.
  • Vlado Keselj and Nick Cercone. CNG Method with Weighted Voting. In Ad-hoc Authorship Attribution Competition (AAAC), June 2004. Part of ALLC/ACH 2004 conference, extended abstract, the best performing algorithm.
  • Vlado Keselj, Nick Cercone, Calvin Thomas, Ken Rockwood, and Elissa Asp. Analysis of Spontaneous Speech in Dementia of Alzheimer Type: Experiments with Morphological and Lexical Analysis., June 2004. Talk given at the PUL Workshop, 23-Apr-2004; Poster and abstract presented at IS'2004 Conference in Ottawa, June 2004..
  • Tony Abou-Assaleh, Nick Cercone, Vlado Keselj. Applying HPSG with Relaxed Unification to Question Answering. The Annual Dalhousie Computer Science In-House Conference (DCSI 2004), 2004. Poster, extended abstract.
  • Sittichai Jiampojamarn, Vlado Keselj, Nick Cercone. Automatic biological term annotation using classification. The Annual Dalhousie Computer Science In-House Conference (DCSI 2004), 2004. Poster, extended abstract.

2003:

  • Nick Cercone, Lijun Hou, Vlado Keselj, Aijun An, and Kanlaya Naruedomkul. From Computational to Web Intelligence. In Computing and Information Sciences: Recent Trends, pages 163-178, Narosa Publishing House. New Delhi, 2003.
  • Keselj, Vlado and Endo, Tsutomu (eds.). Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING'03., Dalhousie University. Halifax, Nova Scotia, Canada, August 2003.
  • Keselj, Vlado and Cercone, Nick. A Graph Unification Machine for NL Parsing. Computers and Mathematics with Applications, vol. 46, pages 393-419, 2003.
  • Vlado Keselj, Fuchun Peng, Nick Cercone, and Calvin Thomas. N-gram-based Author Profiles for Authorship Attribution. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING'03, pages 255-264. Dalhousie University, Halifax, Nova Scotia, Canada, August 2003.
  • Fuchun Peng, Dale Schuurmans, Vlado Keselj, and Shaojun Wang. Automated Authorship Attribution with Character Level Language Models. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003). Budapest, Hungary, April 12-17 2003.
  • Abou-Assaleh, Tony, Cercone, Nick, and Keselj, Vlado. An Overview of the Theory of Relaxed Unification. In Proceedings of the International Conference on Advances in the Internet, Processing, Systems, Interdisciplinary Research, IPSI-2003. Sveti Stefan, Serbia and Montenegro, October 5-11, 2003.
  • Abou-Assaleh, Tony, Cercone, Nick, and Keselj, Vlado. Towards the Theory of Relaxed Unification. In Proceedings of the 14th International Symposium on Methodologies for Intelligent Systems, ISMIS 2003, vol. LNAI 2871 of Lecture Notes in Computer Science, Springer. Maebashi City, Japan, October 28-31, 2003.
  • Abou-Assaleh, Tony, Cercone, Nick, and Keselj, Vlado. Expressing Probabilistic Context-Free Grammars in the Relaxed Unification Formalism. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING'03, pages 29-36. Dalhousie University, Halifax, Nova Scotia, Canada, August 2003.
  • Keselj, Vlado, Asp, Elissa, Rockwood, Kenneth, and Cercone, Nick. Computational analysis of language used by Alzheimer patients in interviews: An objective method for discriminating between healthy and demented individuals. In 6th Annual Symposium on the Treatment of Alzheimer's Disease. Halifax, Nova Scotia, November 26-29 2003.

2002:

  • Nick Cercone, Lijun Hou, Vlado Keselj, Aijun An, Kanlaya Naruedomkul, and Xiaohua Hu. From Computational Intelligence to Web Intelligence. IEEE Computer, vol. 35, no. 11, pages 72-76, November 2002.
  • Kenneth Rockwood, Elissa Asp, Vlado Keselj, and Michael McAllister. Unobtrusive Technology for Alzheimer's Disease. Ottawa, CanadaPoster presentation given at the PRECARN Intelligent Healthcare Technologies presentation, related to the Prime Ministers' meeting..