8th International Conference on Database Systems for Advanced Applications (DASFAA 2003)

Tutorials

Tutorial 1

Title
Statistical Learning Methods for Emerging Database Applications
Tutor
Prof. Edward Chang (UC Santa Barbara)

Tutorial 2

Title
Unstructured Information Management
Tutor
Dr. Mukesh Mohania (IBM India Research Lab)

Tutorial 1: Statistical Learning Methods for Emerging Database Applications

Target Audience

Broad audience including students and researchers interested in an overview of Statistical Learning algorithms for analyzing high-dimensional data such as imagery and gene expression. No particular background expected.

Motivation and Goals

Statistical Learning is a well-established scientific discipline. The theories underlying classical Statistical Learning are based on the assumption of D < N and N → ∞, where D denotes the number data dimensions and N the number of training instances. Many emerging data-analysis applications, however, face the D > N high dimensionality challenge. For instance, an image/video search engine needs to learn users' query concepts with a very small number of training instances (provided by users via relevance feedback) in very high-dimensional feature spaces. In geneexpression profiling, each human cell contains approximately three billion base gene-pairs, which encode between 50,000 to 100,000 genes, but the available training data for analyzing a particular genetic disease is typically fewer than a hundred.

This tutorial will present statistical methods for making inferences in very high dimensional spaces. This tutorial is organized into three parts.

Outline

  1. Classical methods. (0.5 hour)
    • Statistical decision theories.
    • The least squares, the nearest neighbors, and anything in between.
  2. Kernel methods. (1.0 hour)
    • Support Vector Machines.
    • Bayes Point Machines.
    • Dimension reduction techniques.

Tutor

Prof. Edward Chang received his Ph. D. in Electrical Engineering at Stanford University in 1999. He is an Associate Professor of Electrical and Computer Engineering at the University of California, Santa Barbara. (He just received his tenure in March.) His research interests include statistical learning and multimedia databases. He is a recipient of the IBM Faculty Partnership Award from 2000 to 2002, and the NSF Career Award in 2002. His perception-based image retrieval (PBIR) work, which applies statistical learning to understand users' subjective similarity metrics, was recognized as a major breakthrough in image retrieval by the CBIR panel at the 2002 IEEE International Conference on Multimedia. Prof. Chang is a co-founder and the CTO of VIMA Technologies.


Tutorial 2: Unstructured Information Management

Abstract

The growth of the internet has dramatically changed the way in which information is managed and accessed. Web has established itself as a universal repository of all types of information and has become an active medium for doing business between B & C and B & B. To manage and access data efficiently, ranging from unstructured documents to structured record oriented data, several solutions have been developed recently. In this tutorial, we will discuss how to store and access XML and semistructured data using conventional database technology and native format, and review the existing technologies.

The hot challenges today are how to integrate data (including semistructured) coming from different streams for on-demand computing and how to make the web intelligent. We will address these problems and discuss where we are, and what are the research problems.

Outline

  1. Unstructured, XML and Semi-structured Data
  2. Techniques for storing XML/Semi-structured data
  3. XML Query Over Relational Data
  4. Streaming Data (semi-structured) Management
  5. Active Integration of Information
  6. Semantic Web
  7. Applications
  8. Content Manager Architecture

Tutor

Mukesh Mohania received his Ph.D. in Computer Science & Engineering from Indian Institute of Technology, Bombay, India in 1995. He has worked in University of Melbourne, University of South Australia, and Western Michigan University, U.S.A. Currently, he is working at IBM India Research Lab and managing a database group. He has worked extensively in the areas of distributed databases, data warehousing, semistructured databases, and autonomic computing. He has published several research papers and organized conferences in these areas. He was awarded Technical Achievement Award in the area of Web Database Management and Data Warehousing in 2000. He is also an IEEE senior member.