Addressing spatial uncertainty during remote sensing data analysis

Friday, October 26, 2018 - 9:00am - 9:50am
Keller 3-180
Alina Zare (University of Florida)
Most supervised machine learning algorithms assume that each training data point is paired with an accurate training label (for classification) or value (for regression). However, obtaining accurate training label information is often time consuming and expensive, making it infeasible for large data sets, or may simply be impossible to provide given the physics of the problem. Furthermore, human annotators may be inconsistent when labeling a data set, providing inherently imprecise label information. Given this, in many applications, one has access only to inaccurately labeled training data. For example, consider the case of single-pixel or sub-pixel target detection within remotely sensed imagery, often only GPS coordinates for targets of interest are available with an accuracy ranging across several pixels. Thus, the specific pixels that correspond to target are unknown (even with the GPS ground-truth information). Training an accurate classifier or learning a representative target signature from this sort of uncertainly labeled training data is extremely difficult in practice. Similarly, consider the case of pixel-level fusion of mis-aligned hyperspectral and LiDAR data, given mis-alignment we may not know what specific pixels in each data set correspond to the same regions on the ground but we can more easily identify small sets of pixels in which these points are members. In both of these examples, accurately labeled training is unavailable and an approach that can learn from uncertain training labels, such as Multiple Instance Learning (MIL) methods, is required. Once we learn to spot it, we find this challenge of needing to learn from weakly labeled data or uncertain training labels plagues many potential machine learning and pattern recognition applications.

MIL is a variation on supervised learning for problems with imprecise label information. In particular, training data is segmented into positively and negatively labeled bags. In the case of target characterization, the multiple instance learning problem requires that a positive bag must contain at least one instance from the target class and negatively labeled bags are composed of entirely non-target data. Given training data of this form, the overall goal can be to predict either unknown instance-level or unknown bag-level labels on test data. MIL methods are effective for developing classifiers for cases where accurate single-instance-level labeled training data is unavailable. Since the introduction of the MIL framework, many methods have been proposed and developed in the literature. The majority of MIL approaches focus
on learning a classification decision boundary to distinguish between positive and negative instances/bags from the ambiguously labeled data. Although these approaches are effective at training classifiers given imprecise labels, they generally do not provide an intuitive description or representative target concept that characterizes the salient and discriminative features of the target class. The Multiple Instance Adaptive Cosine Estimator (MI-ACE) approach is one of the few MIL methods that can estimate a representative target concept. In this presentation, an introduction to the MI-ACE approach will be provided along with a description of several MIL-based algorithms and results on a variety of data types and applications.