Saturday, August 19, 2017

Research

CVLab is dedicated to the developments of innovative techniques on computer vision, image processing, multimedia systems, etc. Currently, the topics that we are working on are very comprehensive and can be divided into the following items. (click on the icon for a convenient transfer)

 

 

   
   

We aim to resolve the difficulties of action recognition arising from the large intra-class variations. These unfavorable variations make it infeasible to represent one action instance by other ones of the same action. We hence propose to extract both instance-specific and class-consistent features to facilitate action recognition. Specifically, the instance-specific features explore the self-similarities among frames of each video instance, while class-consistent features summarize withinclass similarities. We introduce a generative formulation to combine the two diverse types of features. The experimental results demonstrate the effectiveness of our approach. More

Back to Top

   
   

The development of image querying using content-based image retrieval (CBIR) techniques has attracted a great attention owing to its abundant applicability. We are particularly interested in how the user's semantics could be integrated into a CBIR system. The user's semantics for CBIR involves two different sources of information: the similarity relations entailed by the content-based features, and the relevance relations specified in the feedback. Besides, we also look into the issues of selecting a good feature set for improving the retrieval performance. More

Back to Top

   
   

Data, including observations, measurements or images are quantize/characterized with certain feature representations in the digital world for further processing. However, there exists no a universal way to well-depict all the instances. Particularly, the optimal data descriptors often vary from class to class. We are thus motivated to fuse multiple kernel learning (MKL) into the training procedure, and carry out a class-specific feature selection framework, which significantly facilitates the relevant tasks, such as clustering and classification. More

Back to Top

   

 

 

We aim to design a general learning framework for face detection while handling some problems caused by a variety of variations in images, including Profile, Rotation, Occlusion, Lighting Conditions, Varied Expressions, Multiple Faces and Scales. We are motivated to formulate the task as a classification problem over data of multiple classes. Our approach takes advantage of a multi-class boosting algorithm, MBHboost, to effectively perform face detection with the assistance of its integration with a cascade structure. As a result, it features great flexibility in the sense that only one single boosted cascade is needed without worrying about how to select the most appropriate cascade for the detection. More

Back to Top

   
   

Feature matching, or feature correspondence serves as a core technique for image analysis and understanding. There is a wide range of applications that are closely related to it, such as object recognition, image retrieval, 3D reconstruction, image enhancement, and so on. The problems of correspondence involves clutter background, significant amount of outliers and occlusion. Moreoever, multiple translations, orientations and deformations also negatively affect the matching of features in terms of precision, recall and efficiency. To this end, we look into these problems and propose robust frameworks to resolve them. More

Back to Top

   
   

Object recognition is one of the vision applications that deal with data of multiple classes. In such a case, not only discriminative features but also the class-specific features should be considered because the goodness of a feature representation for recognition is often category-dependent, and can even be object-dependent for the case of large intra-class variation. Thus, we aim to improve recognition accuracy by focusing on feature representation in terms of image features and similarity measures, where various feature representations in the domain of kernel matrices are fused to alleviate the difficulties caused by diverse forms of them. More



 

Back to Top

   
   

The goal of people counting is to estimate the number of people or the density of crowds in a monitored environment. Both the long-term and short-term statistics of people counts of an environment provide useful information for strategy planning or event detection. However, detecting or estimating the density of crowds is always a challenging task due to some potential difficulties, such as partial occlusions, low-quality images, clutter backgrounds, and so on. To this end, we focus on the framework where multiple cameras with different angles of view are available, and consider the visual cues captured by each camera as a knowledge source, carrying out cross-camera knowledge transfer to alleviate the difficulties. More

Back to Top

   
   

We focus on semantic segmentation, where class-based image segmentation is of focus as the task of labeling pixels with several pre-defined object classes or background in an image. Distinct from the image driven segmentation task, class based image segmentation aims to not only identify the object classes of interest, but also determine the shapes or boundaries of these objects. It in fact involves resolving two of the most fundamental problems in vision research: recognition and segmentation. Therefore, it plays an essential role in many high-level computer vision applications, such as image and scene understanding. More

Back to Top

   
   

The cost of data labeling for image recognition or classification is often expensive. To reduce the labeling effort, transfer learning has been demonstrated to be a promising technique for object recognition with few training samples. It delivers useful knowledge in the source to improve the target model learning. We particularly focus on transferring knowledge from multiple classes to multiple classes, given two multi-class recognition tasks (one in the source domain and the other in the target domain), leveraging the extra source knowledge to learn a more robust multi-class classifier rather than a set of binary classifiers in the target domain. More

Back to Top