LMC:基于单应性矩阵的局部运动一致遥感图像匹配
Homography Matrix-Based Local Motion Consistent Matching for Remote Sensing Images
基于单应性矩阵的局部运动一致遥感图像匹配
摘要
特征匹配是图像处理领域的基础任务,旨在保证两组特征之间具有正确的对应关系。基于描述符相似性构建的假定匹配总会含有大量错误匹配。为了去除错误匹配,我们提出了一种遥感图像特征匹配方法LMC(Local Motion Consistency),局部运动一致性是指正确匹配相邻间具有相同运动的性质。LMC的核心思想是搜索到具有正确运动趋势的邻域,并保留与该邻域具有相同运动的匹配。基于这个思想,我们使用单应性矩阵设计了一个局部几何约束来表示局部运动一致性。该约束具有射影不变性,适用于多种变换类型。为了避免异常值影响到对于具有正确运动的邻域的搜索,我们引入了重采样方法来构建邻域。此外,我们设计了一种跳出机制,可以在不搜索完所有可能的情况下跳出循环,以此来降低运行时间。LMC可以在100毫秒内处理超过1000个错误匹配。我们在不同变换类型的图像数据集上进行了实验。实验结果表明,LMC具有比目前先进方法更优异的匹配性能。
Feature matching is a fundamental task in the field of image processing, aimed at ensuring correct correspondence between two sets of features. Putative matches constructed based on the similarity of descriptors always contain a large number of false matches. To eliminate these false matches, we propose a remote sensing image feature matching method called LMC (Local Motion Consistency), where local motion consistency refers to the property that adjacent correct matches have the same motion. The core idea of LMC is to find neighborhoods with correct motion trends and retain matches with the same motion. To achieve this, we design a local geometric constraint using a homography matrix to represent local motion consistency. This constraint has projective invariance and is applicable to various types of transformations. To avoid outliers affecting the search for neighborhoods with correct motion, we introduce a resampling method to construct neighborhoods. Moreover, we design a jump-out mechanism to exit the loop without searching all possible cases, thereby reducing runtime. LMC can process over 1000 putative matches within 100 milliseconds. We conduct experiments on image datasets with different types of transformations, and the results show that LMC has better matching performance than the current state-of-the-art approaches.
关键词:特征匹配、错误匹配去除、单应性矩阵、局部运动一致性、重投影误差
Index Terms:Feature matching, mismatch removal, homography matrix, local motion consistency, reprojection error
1. 引言
特征匹配是图像处理邻域的基本问题之一[1],旨在为各种变换类型的图像对建立可靠的对应关系。特征匹配方法的匹配性能对于许多任务十分重要,如三维重建[2] [3]、图像配准和融合[4] [5] [6]、图像拼接[7] [8]等。这些任务在鲁棒性、精度和效率等方面对特征匹配方法有着很高的要求。
Feature matching is one of the fundamental problems in the field of image processing [1], which aims to establish reliable correspondences between image pairs for various types of transformations. The matching performance of feature matching methods is crucial for many tasks such as 3D reconstruction [2] [3], image registration and fusion [4] [5] [6], image stitching [7] [8], etc. These tasks have high requirements on the feature matching methods in terms of robustness, accuracy, and efficiency.
特征匹配问题具有组合性质[9]。例如,将一个点集中的N个点匹配到另一个点集中的M个点,将产生种不同的匹配结果,这具有指数级的时间复杂度。为了解决这个问题,现在一种主流的方法是采用两阶段策略的间接特征匹配。在第一阶段,使用特征描述符方法(如SIFT[10]、SURF[11]、ORB[12]等)根据局部补丁描述符的相似性构造假定匹配集合。这种方法可以显著降低特征匹配问题的时间复杂度。但由于局部描述符的模糊性,假定匹配集合中通常存在大量的错误匹配。因此,在第二阶段,需要使用一个几何约束来区分假定匹配集合中的正确匹配(即内点)和错误匹配(即异常值)。
The feature matching problem exhibits a combinatorial property [9]. For example, matching N points from one set to M points in another set produces different matching results, resulting in exponential time complexity. To address this problem, a mainstream approach now is to use an indirect feature matching strategy with a two-stage process. In the first stage, a set of putative matches is constructed based on the similarity of local patch descriptors using feature descriptor methods such as SIFT [10], SURF [11], ORB [12], etc. These methods can significantly reduce the time complexity of feature matching problems. However, due to the ambiguity of local descriptors, there are typically a large number of false matches in the set of putative matches. Therefore, in the second stage, a geometric constraint is needed to distinguish between correct matches (i.e. inliers) and false matches (i.e. outliers) in the set of putative matches.
In the first stage, feature descriptor methods such as SIFT \cite{ref10}, SURF \cite{ref11}, ORB \cite{ref12}, are used to create a set of putative matches based on the similarity of local patch descriptors. These methods greatly reduce the time complexity of feature matching problems. However, the ambiguity of local descriptors leads to a significant number of false matches in the set of putative matches. Thus, a geometric constraint is necessary in the second stage to distinguish between correct matches (i.e. inliers) and false matches (i.e. outliers) in the set of putative matches.
第二阶段,也被称为错误匹配去除,是现在间接特征匹配方法面临的核心问题之一。为了去除错误匹配,许多方法被提出。接下来,我们将错误去除匹配方法分为以下五种并进行简要的回顾。
The second stage, also known as mismatch removal, is one of the key challenges faced by current indirect feature matching methods. To eliminate false matches, numerous methods have been proposed. In the following, we categorize mismatch removal methods into the following five types and provide a brief review.
1.1 基于重采样的方法
RANSAC(随机采样一致)[14=13]是经典的重采样方法,是一种用于估计模型的方法。其主要思想是通过随机选择数据中的一小部分进行模型参数的估计,然后用估计出的模型对剩余数据进行测试,并将符合模型的数据与不符合模型的数据进行区分。在错误匹配方法中通常是用来估计单应性矩阵。现在已经提出许多改进的RNASAC方法,如MLESAC[19=14]、PROSAC[20=15]、MAGSAC++[21=16]、GC-RANSAC[22=17]等。其中PROSAC的基本思路是根据一定的概率分布,逐步增加采样点的数量,以筛选出更加准确的估计参数。MAGSAC++提出了一种-consensus的方法,利用边缘化(Marginalization)的概念,来避免RANSAC中的阈值需要人为设定。GC-RANSAC通过图切割方法选择内点。这些基于重采样的方法可以被理解为对图像间的变换进行了全局建模,然而当图像间的变换过于复杂时,全局建模不能很好表示这些变换。
Random Sample Consensus (RANSAC) [14] is a classic resampling method used to estimate a model. The main idea of RANSAC is to randomly select a small subset of data to estimate model parameters, test the remaining data with the estimated model, and distinguish between data that fits the model and data that does not fit the model. RANSAC is typically used in mismatch removal methods to estimate homography matrices. Many improved RANSAC methods have been proposed, such as MLESAC [19], PROSAC [20], MAGSAC++ [21], GC-RANSAC [22], and others. The basic idea of PROSAC is to gradually increase the number of sampled points according to a certain probability distribution to select more accurate estimated parameters. MAGSAC++ proposes a -consensus method that uses the concept of marginalization to avoid the need for manually setting the threshold in RANSAC. GC-RANSAC selects inliers using graph-cutting methods. These resampling-based methods can be understood as globally modeling the transformations between images. However, when the transformations between images are too complex, global modeling may not represent these transformations well.
1.2 基于非参数模型的方法
为了处理图像间更复杂的变换,基于非参数模型的方法被提出。代表性的方法有识别对应函数(ICF)[15=18]、相干点漂移(CPD)[16=19]和向量场共识(VFC)[23=20]。其中VFC认为正确匹配的运动向量具有运动一致性,由正确匹配的运动向量组成的向量场是平滑的。VFC以此来定义能量函数,将向量场恢复成一致,并选取与向量场一致的向量作为内点。但是这些方法定义的模型是全局的,对于局部变化,这些方法可能并不适用。并且由于没有显式地建立模型,因此通常需要更多的计算资源和运算时间。
To handle more complex transformations between images, methods based on non-parametric models have been proposed. Representative methods include identifying correspondence function (ICF) [15], coherent point drift (CPD) [16], and vector field consensus (VFC) [23]. Among them, VFC assumes that motion vectors of correct matches have motion consistency, and the vector field composed of motion vectors of correct matches is smooth. VFC defines an energy function based on this and restores the vector field to consistency, selecting the vectors consistent with the vector field as inliers. However, the models defined by these methods are global and may not be suitable for local transformations. Moreover, since no explicit model is established, these methods typically require more computational resources and time.
1.3 基于图匹配的方法
以往大多数基于图匹配的方法都是属于直接匹配,而不是通过设计相似约束来去除错误匹配[24=21] [25=22] [26=23] 。图匹配方法被表述为二次分配问题(QAP)[27=24],通常具有较高的计算复杂度,且泛用性差。最近出现了用于去除错误匹配的图匹配方法,如LGSC[28=25]和MCDM[29=26]。LGSC根据局部几何信息的一致性,提出了使用局部图结构一致性来去除错误匹配,但它的运行时间不占优势。MCDM引入了局部运动一致性先验,提出了使用概率图模型来去除错误匹配并展现了不错的匹配效果。使用图的方法去除错误匹配是一种比较新颖且可行的思路。
Most of the previous graph matching methods belonged to direct matching, rather than removing false matches by designing similarity constraints [24] [25] [26]. Graph matching methods are formulated as quadratic assignment problems (QAP) [27], which are usually computationally complex and not widely applicable. Recently, graph matching methods have emerged to remove false matches, such as LGSC [28] and MCDM [29]. LGSC proposes using local graph structure consistency to remove false matches based on the consistency of local geometric information, but it does not have an advantage in runtime. MCDM introduces a prior of local motion consistency and proposes using a probabilistic graph model to remove false matches, demonstrating good matching results. Using graph-based methods to remove false matches is a relatively novel and feasible approach.
1.4 基于学习的方法
随着深度学习的兴起,越来越多基于学习的去除错误匹配方法被提出。较早出现的LMR[30=27]和LFGC[31=28]。LMR是一种二分类器,因为其匹配表达式的限制,对于训练集之外的数据集的匹配性能可能并不理想。LFGC是基于深度网络进行错误匹配去除的代表方法,它可以通过深度网络估计两图像间的变换参数并还原相对姿态。但是因为LFGC需要相机参数和特定的参数模型,它在实际应用中受到限制。最近,一些新颖的方法被提出,比如SuperGlue[32=29]、GANet[33=30]和CSRNet[34=31]。SuperGlue提出了使用图神经网络从局部特征生成可靠匹配。GANet在SuperGlue基础上提出了一种多头图注意机制和一种稀疏注意映射(a sparse attention map),有效的使模型轻量化并提高性能。但是GANet受限于特点的参数模型,使得它的泛用性还有待提高。CSRNet提供了另一种思路。它引入了上下文感知的注意(Context-Aware Attention (CAA))机制并且提出了一种置换不变结构表示(STR)学习模块。但是CSRNet忽略了来自源图像的信息并且由于自身的时间复杂度而不能满足一些实时性要求较高的任务。此外还有直接用于图匹配的学习方法,如GLMNet[35=32]、GCAN[36=33]。基于深度学习的方法在错误匹配去除和运动模型估计等研究上表现出了巨大的潜力。
With the rise of deep learning, more and more learning-based methods have been proposed for removing false matches. Earlier methods include LMR [30] and LFGC [31]. LMR is a binary classifier that may not perform well on datasets outside the training set due to the limitations of its matching expression. LFGC is a representative method for removing false matches based on deep networks, which estimates the transformation parameters between two images using a deep network and restores the relative pose. However, LFGC is limited in practical applications due to its need for camera parameters and specific parameter models. Recently, some novel methods have been proposed, such as SuperGlue [32], GANet [33], and CSRNet [34]. SuperGlue proposes to generate reliable matches from local features using graph neural networks. GANet builds on SuperGlue with a multi-head graph attention mechanism and a sparse attention map, effectively making the model lightweight and improving its performance. However, GANet is limited by its specific parameter model, and its generality needs to be improved. CSRNet introduces a context-aware attention mechanism and proposes a permutation-invariant structure tepresentation (STR) learning module. However, CSRNet ignores information from the source image and cannot meet the real-time requirements of some high-speed tasks due to its own time complexity. Additionally, there are learning-based methods that are directly applied to graph matching, such as GLMNet[35] and GCAN[36]. The use of deep learning-based approaches has shown great potential in research areas such as mismatch removal and motion model estimation.
1.5 基于局部几何约束的方法
基于局部几何约束的错误匹配去除是最近比较流行的一类方法。与全局模型不同,基于局部几何约束的方法可以更好的处理场景发生局部变化的情况,并且在不改变自身模型的情况下可以在不同变化类型的特征匹配任务中保持性能。基于局部几何约束的方法在处理各种类型的图像变换时通常可以达到较高的进度或者较快的速度。
Mismatch removal based on local geometric constraints is a currently one of the most popular methods. Unlike global models, local geometric constraint-based methods can better handle situations where the scene undergoes local changes, and can maintain performance in feature matching tasks with different types of transformations without changing their own model. Local geometric constraint-based methods can typically achieve high accuracy or fast speed when processing with various types of image transformations.
经典且具有代表性的方法是LPM[18=34]和GMS[37=35]。LPM基于运动一致性的思想提出了使用余弦相似度来判断特征与其邻域点是否运动一致。GMS将图像划分成网格,将运动平滑性约束转换为统计度量以拒绝错误匹配。这两种方法简单且鲁棒,但是它们的约束能力还有待提高。于是后续提出了一系列试图提高约束能力的方法。局部结构一致性约束(LSCC)[38=36]引入皮尔逊相关系数来衡量特征点邻域结构的一致性。多尺度局部性和秩保持(mTopKRP)[39=37]定义了基于多尺度邻域的排名列表距离测量,来更严格和普遍地保留局部拓扑结构。多邻域引导肯德尔秩相关系数(mGKRCC)[40=38]提出特征点的邻域点具有秩序一致性并使用肯德尔相关系数来度量邻域点的秩序的误差。邻域流形保持一致(NMRC)[41=39]提出了邻域构建的迭代滤波以获得更可靠的邻域点,并使用流形学习来保留具有邻域拓扑一致的内点。这四种方法是通过一些数学手段(如相关性,最小化重构误差等)来寻找特征点与邻域点潜在的关系。除了这类方法外,还有另外一类方法,它们预定义一个表示特征点和邻域点关系的几何模型,以此来寻找内点基于框架的概率局部验证(IPLV)[42=40]使用通过仿射协变检测器(如MSER)检测得到的局部仿射框架计算两特征点邻域的重投影误差,并提出一种概率模型结合重投影误差计算特征点是内点的概率。局部仿射保留(LAP)[43=41]根据内点具有运动一致性的假设去除邻域点中一部分异常值,并定义了一个由中心特征点及其三个邻居组成的最小拓扑单元(MTU),通过MTU的比值衡量邻域拓扑一致性。其中IPLV和LAP都提出了具有仿射不变性的几何约束,考虑到仿射不变性是射影不变性的子集,于是我们使用具有射影不变性的单应性矩阵来设计局部几何约束。这样的局部几何约束具有更严格的约束能力。
Classical and representative methods include LPM[18] and GMS[37]. LPM proposes using cosine similarity to judge whether a feature and its neighboring points have consistent motion based on the idea of motion consistency. GMS divides the image into grids and transforms the smoothness constraint of motion into statistical measurement to reject false matches. These two methods are simple and robust, but their constraint abilities still need to be improved. Subsequently, a series of methods have been proposed to enhance their constraint abilities. Local structure consistency constraint (LSCC)[38] introduces Pearson correlation coefficient to measure the consistency of the structure of feature points’ neighborhoods. Multiscale locality and rank preservation (mTopKRP)[39] defines rank list distance measurement based on multi-scale neighborhoods to more strictly and generally preserve local topological structure. Multi-neighborhood guided Kendall rank correlation coefficient (mGKRCC)[40] proposes that the neighborhood points of feature points have rank consistency and uses the Kendall correlation coefficient to measure the error of the rank order of neighborhood points. Neighborhood manifold representation consensus (NMRC)[41] proposes iterative filtering of neighborhood construction to obtain more reliable neighborhood points and uses manifold learning to preserve inliers with consistent neighborhood topology. These four methods seek the potential relationships between feature points and their neighboring points through mathematical means such as correlation, minimizing reconstruction errors, etc.
Apart from these methods, there are another type of methods that predefine a geometric model representing the relationship between feature points and their neighboring points to find inliers. Affine covariant detectors (such as MSER) are used to calculate the reprojection error of two feature point neighborhoods in the frame-based probabilistic local verification (IPLV)[42], which proposes a probabilistic model that combines reprojection error to calculate the probability of feature points being inliers. Local affine preservation (LAP)[43] removes some outliers in the neighborhood points based on the hypothesis that inliers have motion consistency and defines the minimum topological unit (MTU) consisting of the center feature point and its three neighbors. The consistency of neighborhood topology is measured by the ratio of MTUs. IPLV and LAP propose geometric constraints with affine invariance. Considering that affine invariance is a subset of projective invariance, we use homography matrices with projective invariance to design local geometric constraints. Such local geometric constraints have stronger constraint abilities.
对于遥感图像来说。一方面,由于地势变化、成像视点变化等原因,图像会发生局部失真导致空间关系变得复杂[17=42],全局几何变换模型难以很好地表示图像间的变换。另一方面,如果使用复杂的非刚性变换模型来表示图像间的变换,会导致方法的计算复杂度增加。因此,需要设计出一种具有低时间复杂度并且可以处理复杂的几何变换的错误匹配去除方法。
For remote sensing images, on the one hand, due to local distortions caused by changes in terrain and imaging viewpoints, spatial relationships become complex [17], and global geometric transformation models cannot represent image transformations well. On the other hand, using complex non-rigid transformation models to represent image transformations would increase the computational complexity of the method. Therefore, it is necessary to design a mismatch removal method that has low time complexity and can handle complex geometric transformations.
在本文中,我们提出了一种名为LMC(Local Motion Consistency)的遥感图像特征匹配方法。根据局部运动一致性[18=34],正确的匹配与其相邻的内点具有相同的运动,而错误的匹配则不是。LMC的核心思想是寻找到具有正确运动趋势的邻域,并将邻域作为局部,保留与局部具有相同运动的匹配。我们在多个公开数据集上进行了实验。实验结果表明LMC具有线性时间复杂度,并且可以处理具有复杂的几何变换的图像。
In this paper, we propose a remote sensing image feature matching method named LMC (Local Motion Consistency). Based on local motion consistency [18], correct matches have the same motion as their neighboring inliers, while false matches do not. The core idea of LMC is to find neighborhoods with correct motion trends and treat them as local regions, retaining matches that have the same motion as the local region. We conducted experiments on multiple public datasets, and the results show that LMC has linear time complexity and can handle images with complex geometric transformations.
我们的贡献可以总结为以下三点:
- 我们提出了一种基于单应性矩阵的局部几何约束用于遥感图像特征匹配。与其他基于局部几何约束的方法相比,LMC具有更严格的约束,旨在利用单应性矩阵 的特性来表示局部运动一致性,以此来保留正确的匹配。该约束具有射影不变性,适用于多种刚性或非刚性变形的图像。
- 我们设计了一种跳出机制,可以在不搜索完所有可能的情况下跳出循环,以此来降低运行时间。LMC可以在100毫秒内处理超过1000个假定匹配。
- 为了避免异常值影响到对于具有正确运动的邻域的搜索,我们引入了重采样方法来构建邻域。
此外,我们提出的方法可以为后续的图像配准等工作提供关于表示局部几何变换的单应性矩阵的先验知识。
Our contributions can be summarized as follows:
- We propose a local geometric constraint based on the homography matrix for feature matching in remote sensing images. Compared to other methods based on local geometric constraints, LMC has more strict constraints that aim to utilize the properties of the homography matrix to represent local motion consistency, thereby retaining correct matches. This constraint is projectively invariant and applicable to images with various rigid or non-rigid deformations.
- We design a jump-out mechanism that can exit the loop without searching through all possible cases, thereby reducing the runtime. LMC can process more than 1000 putative matches within 100 milliseconds.
- To avoid outliers affecting the search for neighborhoods with correct motion, we introduce a resampling method to construct neighborhoods.
In addition, our proposed method can provide prior knowledge about the homography matrix representing local geometric transformations for subsequent tasks such as image registration.
本文的其余部分组织如下:在第二节中,我们详细介绍了所提出的LMC方法,其中包括使用RANSAC构造可靠邻域集合、基于单应性矩阵的重投影误差计算和一种跳出机制。在第三节中,我们在不同类型的数据集中将我们的方法和几种先进的方法进行了比较,给出了定性和定量的实验评估,对方法进行了稳健性分析。此外,我们讨论了不同邻域构建方法在不同数据集中对方法性能造成的影响。在第四节中,我们对于实验结果进行了简要的讨论。最后,在第五节给出了一个简要的结论。
The rest of this paper is organized as follows: In Section 2, we provide a detailed description of the proposed LMC method==, including the use of RANSAC to construct reliable neighborhood sets, the calculation of reprojection errors based on homography matrices, and a jump-out mechanism.== In Section 3, we compare our method with several state-of-the-art methods on different types of datasets and present qualitative and quantitative experimental evaluations, as well as a robustness analysis. Furthermore, we discuss the impact of different neighborhood construction methods on method performance in different datasets. In Section 4, we provide a brief discussion. Finally, in Section 5, we present a brief conclusion.
3. 方法
在本节中,我们提出了一种基于单应性矩阵表示局部运动一致性的几何约束,并使用这种几何约束进行错误匹配去除。我们将图像根据特征点的邻域划分成许多局部区域,使用单应性矩阵来表示局部的几何变换,并计算特征点的重投影误差,以此来区分正确匹配和错误匹配。此外,我们使用了RANSAC来构建邻域以提高约束的可靠度,并提出了一种跳出机制来降低运行时间。我们提出的方法的流程图如图1所示。
In this section, we propose a geometric constraint based on homography matrix to represent local motion consistency, and employ this constraint to remove false matches. We partition the images into many local regions based on the neighborhood of feature points, represent local geometric transformation using homography matrix, and calculate the reprojection error of feature points to distinguish correct matches from incorrect ones. Additionally, we use RANSAC to build neighborhoods to improve the reliability of constraints and propose a jump-out mechanism to reduce computation time.
3.1 问题描述
在给定一对遥感图像和后,我们通过SIFT获得一组假定匹配特征点对集合,其中和分别表示图像和的特征点的二维坐标向量,表示共有对特征点。
After obtaining a pair of remote sensing images and , we use SIFT to obtain a set of putative matching feature point pairs , where and represent the 2D coordinate vectors of feature points in images and , respectively, and represents the total number of feature point pairs.
由于根据特征描述符得出的假定匹配集合中总是不可避免的存在许多错误匹配,所以我们的目标是在给定的匹配集合中去除错误匹配(异常值),尽可能多的保留正确匹配(内点),得到最优的内点集。
Due to the inevitable existence of many false matches in the set of putative matches based on feature descriptors, our goal is to remove the false matches (outliers) and retain as many correct matches (inliers) as possible in the given putative matches set , to obtain the optimal inlier set .
我们从多个错误匹配去除方法([18],[23],[28],[29],[41],[43])中总结出以下两点共识。
We have summarized two consensus points from multiple methods ([18],[23],[28],[29],[41],[43]) for removing false matches.
共识一:运动一致性。正确匹配的向量(图1(b)中蓝色向量)相邻间具有相同的运动且总体具有相似的运动趋势,但是错误匹配的向量(图1(d)中红色向量)更倾向于在全图中进行随机运动。我们将正确匹配的向量相邻间具有相同的运动称为局部运动一致性。
Consensus 1: Motion consistency. The vectors of correct matches (blue vectors in Fig. 1(b)) have similar motions overall, and adjacent vectors have consistent motions, while the vectors of false matches (red vectors in Fig. 1(d)) tend to exhibit random motion across the entire image. We refer to the consistent motion between adjacent vectors of correct matches as local motion consistency.
共识二:邻域拓扑结构稳定性。真实物体受制于其物理模型,使得特征点的邻域拓扑结构经过几何变换后会略有变化,但通常是稳定的。特征点的邻域拓扑结构的稳定性通常表现在当特征点和构成邻域的点都是内点的时候,如图1(a)中黄色的拓扑结构 。而当邻域点中存在异常值时,邻域拓扑结构通常不具有稳定性,如图1©中红色的拓扑结构。
Consensus 2: Neighborhood topology stability. Real objects are subject to their physical models, which makes the neighborhood topology of feature points slightly change after geometric transformation, but it is usually stable. The stability of the neighborhood topology of feature points is usually manifested when both the feature point and the points that make up the neighborhood are inliers, as shown in the yellow topology in Fig.1(a). However, when there are outliers among the neighborhood points, the neighborhood topology usually lacks stability, as shown in the red topology in Fig.1©.
图1 运动一致性以及邻域拓扑结构稳定性示意图
(a)和©为图像对和,蓝线连接图像对中正确匹配的特征点和。(a)中黄线表示当邻域点都是内点时,特征点与邻域点的连线。©中红线表示当邻域点中存在异常值时,特征点与邻域点的连线。(b)和(d)为特征点的运动向量,由指向。(b)为正确匹配向量的运动场。 ©为错误匹配向量的运动场。
(a) and © are image pairs and , where the blue lines connect the correctly matched feature points and . The yellow lines in (a) represent the connections between feature points and neighboring points when all the neighboring points are inliers. The red lines in © represent the connections between feature points and neighboring points when there are outliers among the neighboring points. (b) and (d) are motion vectors of matches, pointing from to . (b) is the motion field of correct match vectors, while © is the motion field of false match vectors.
因此,我们可以设计一种局部几何约束利用邻域来衡量匹配的局部运动一致性,以此来判断匹配是否正确。为此,我们将去除错误匹配问题形式化为下式:
Therefore, we can design a local geometric constraint that uses the neighborhood to measure the local motion consistency of the matches, to determine whether a match is correct. To this end, we formalize the problem of mismatch removal as follows:
其中成本函数定义为:
The cost function is defined as follows:
其中表示内点的集合,表示的基数;表示匹配和其邻域的运动一致性的程度,的值越大,说明越不一致,是错误匹配的概率越大;成本函数分为两项,其中第一项惩罚任何大于0的对应关系,第二项用于保留更多的内点,所以要使成本函数达到最优解,就要在内点数尽可能多的情况下,使尽可能的小,参数用于平衡这两项。接下来,我们将介绍如何构建可靠邻域点集和设计局部几何约束来定义。
where represents the set of inliers, represents the cardinality of , and represents the degree of motion consistency between the match and its neighborhood. A larger value of indicates greater inconsistency, and hence a higher probability that is a false match. The cost function consists of two terms: the first penalizes any matches with , while the second is used to retain more inliers. Thus, to obtain the optimal solution of the cost function , we aim to maximize the number of inliers while minimizing as much as possible, with the parameter used to balance these two terms. Next, we will introduce how to construct a reliable neighborhood set and design local geometric constraints to define .
3.2 基于RANSAC构建可靠邻域点集
如果是在假定匹配集合中寻找邻域点,那么邻域点会不可避免的受到异常值的污染。考虑到上述两点共识都是基于特征点和邻域点都是内点的情况,为了保证局部几何约束的结果的可靠程度,我们需要尽可能的去除邻域中的异常值。为此,我们使用RANSAC来构建可靠邻域点集。
If we search for neighborhood points within the set of putative matches S, the neighborhood points will inevitably be contaminated by outliers. Given that the two aforementioned consensuses are based on the case where both feature points and neighborhood points are inliers, in order to ensure the reliability of the results of the local geometric constraints, we need to remove outliers from the neighborhood as much as possible. To this end, we use RANSAC to construct a reliable set of neighborhood points.
RANSAC(随机采样一致)是基于重采样的方法来估计模型参数的方法。在本文中,RANSAC被用于估计单应性矩阵。单应性矩阵可以将一个平面上的点映射到另一个平面上的对应点,用于描述平面到平面的映射关系,因此它可以表示两平面间的几何变换,包括旋转、平移、缩放和投影等。假设和分别为两平面中的点,则它们的映射关系可以表示为:
RANSAC (Random Sample Consensus) is a method based on resampling to estimate model parameters. In this paper, RANSAC is used to estimate the homography matrix. The homography matrix can map a point on one plane to a corresponding point on another plane, and is used to describe the mapping relationship between planes, including rotation, translation, scaling, and projection. Assuming that and are points in the two planes, their mapping relationship can be expressed as:
其中表示单应性矩阵,可以通过SVD或高斯消元法对其进行求解,因为其具有8个参数,所以至少需要四对点才能求解。
Where represents the homography matrix, which can be solved using SVD or Gaussian elimination. Since it has eight parameters, at least four pairs of points are required to compute it.
使用RANSAC估计单应性矩阵的步骤是:首先从匹配点集中随机选择一组点,然后计算这些点之间的单应性矩阵,并使用该矩阵对其他的匹配点进行验证。验证的方法是计算其他匹配点与该单应性矩阵下的投影点之间的重投影误差。如果误差小于阈值,则将该匹配视为内点。然后重新采样重复上述步骤,直到满足停止迭代条件。最终选择能够验证出最多内点的单应性矩阵作为最优解,并将验证出的内点作为可靠邻域点集。作为全局几何变换模型,RANSAC将邻域点集的整体运动趋势约束到与表示的平面运动趋势一致,这使得不包含一些随机分布的异常值。阈值控制约束的强度,越小,约束的强度越大。
The steps to estimate the homography matrix using RANSAC are as follows: First, a random sample of corresponding point pairs is selected from the set of matches S. Using these point pairs, a homography matrix is computed. The homography matrix is then used to project the remaining point pairs, and the reprojection error between the projected points and their actual locations is computed. If the error is less than a pre-defined threshold , the corresponding point pair is considered an inlier. The process is repeated by resampling until the stopping criteria are met. The homography matrix that can validate the most inliers is selected as the optimal solution, and the validated inliers are regarded as the reliable neighborhood point set . As a global geometric transformation model, RANSAC constrains the overall motion trend of the neighborhood point set to be consistent with the planar motion trend represented by , which ensures that does not contain some randomly distributed outliers. The threshold controls the strength of the constraint, with smaller leading to stronger constraints.
3.3 基于单应性矩阵的重投影误差计算
因为受到地势变换、成像视点变换和局部非刚性几何失真等因素的影响,遥感图像不能被视为一个平面[41],所以遥感图像中的复杂非刚性变换通常无法使用全局几何变换模型表示。但是全局的复杂的几何变换可以由许多局部的简单的几何变换近似的表现出来,这种思想经常在图像拼接、图像匹配方法中被使用。考虑到单应性矩阵是一个可以表示简单几何变换(如仿射、投影)的几何模型,我们尝试使用单应性矩阵来表示局部的几何变换。因此,在本节中我们引入单应性矩阵计算重投影误差来衡量特征点与最小单应性单元(MHU)的局部运动一致性。
The complex non-rigid transformations in remote sensing images cannot be represented by a global geometric transformation model due to factors such as terrain changes, imaging viewpoint changes, and local non-rigid geometric distortion [43]. However, global complex geometric transformations can be approximated by many local simple geometric transformations, which is a common idea in image stitching and matching methods. Considering that the homography matrix is a geometric model that can represent simple geometric transformations (such as affine and projection), we attempt to use the homography matrix to represent local geometric transformations. Therefore, in this section, we introduce the use of homography matrix to calculate the reprojection error to measure the local motion consistency between feature points and the minimum homography unit (MHU).
我们在可靠邻域点集中搜索中所有特征点在欧几里得距离下的个最近的邻域点,并构建邻域,同理,构建特征点的邻域。因为邻域点是以欧几里得距离搜索得到的,所以邻域和中的点在假定集中可能不是一一对应的。我们需要将邻域和中相互匹配的点提取出来,生成邻域匹配点对集合,并使用中的点对进行单应性矩阵的计算。的表达式为:
We search for the nearest neighboring points of each feature point in the reliable neighborhood point set under the Euclidean distance, and construct the neighborhood . Similarly, we construct the neighborhood for feature point . Since the neighboring points are obtained by searching under the Euclidean distance, the points in and may not correspond one-to-one in the putative set . We need to extract the matched points from and , generate a set of matched neighborhood point pairs , and use the points in to compute the homography matrix. The expression for is:
单应性矩阵是一个的矩阵,具有8个自由度,因此最少需要4对特征点可以对其进行求解。我们需要在中选取4对不同的特征点构建MHU来对单应性矩阵求解,于是有:
The homography matrix is a matrix with 8 degrees of freedom, thus at least 4 pairs of feature points are required for its estimation. We need to select 4 distinct pairs of feature points from to construct the minimum homography unit (MHU) for homography matrix computation, as follows:
其中表示的基数,即。表示中用于计算单应性矩阵的邻域点排列组合个数。中的所有邻域点都可以找到其在假定集中的索引,我们将这些索引存放在索引矩阵中,则表示第个排列中第个邻域点在中的索引。
Where represents the cardinality of , i.e., . represents the number of possible combinations of neighborhood points in used to calculate the homography matrix. All neighborhood points in can be found by their indices in the putative set , and we store these indices in the index matrix . Thus, denotes the index of the -th neighborhood point in the -th combination in .
图2 MHU结构以及重投影误差计算示意图
图2中表示第个MHU。、、、表示第个排列中的邻域点,该四点围成的的四边形表示由该四点构成的的邻域,将该四点与用虚线连接构成一个邻域拓扑结构,未标注的点是的其他邻域点。、、、表示的邻域点在邻域中对应的匹配点,是的假定匹配点,是根据邻域的几何变换投影到图像中的点,表示重投影误差。
In Figure 2, represents the -th MHU. , , , and denote the neighborhood points of in the -th combination, and the quadrilateral formed by these four points represents the neighborhood of . A dashed line connects these four points and to create a neighborhood topology. Unlabeled points represent other neighborhood points of . , , , and denote the matching points of 's neighborhood points in 's neighborhood, where is the putative matching point of , is the point where is projected onto the image according to the geometric transformation of the neighborhood, and represents the reprojection error.
首先需要计算表示邻域的几何变换的单应性矩阵,于是有:
First, it is necessary to compute the homography matrix that represents the geometric transformation of the neighborhood. Thus, we have:
其中表示根据匹配的第个MHU计算出的单应性矩阵,该矩阵可以通过高斯消元法求解。和分别表示和的第个MHU中第个邻域点的三维齐次坐标。根据可以计算出投影到图像的点:
where represents the homography matrix calculated based on the -th MHU of match , which can be solved by Gaussian elimination. and respectively represent the homogeneous 3D coordinates of the -th neighborhood point in the -th MHU of and . Based on , the projection of onto image can be calculated as follows:
其中表示特征点通过单应性矩阵计算得到的在图像中的映射。表示了MHU的运动趋势,根据局部运动一致性,如果是正确匹配,那么的运动趋势应该与MHU的运动趋势一致,即和的距离应该很小。
where represents the mapping of feature point to the image obtained by using the homography matrix . represents the motion trend of the MHU. According to local motion consistency, if is a correct match, the motion trend of should be consistent with that of the MHU, i.e., the distance between and should be small.
因此使用和的欧几里得距离来衡量与第个MHU的运动一致性:
Therefore, the Euclidean distance between and is used to measure the motion consistency between and the -th MHU:
其中表示和根据计算出的重投影误差,的值越小,说明与第个MHU的运动一致性越高,是正确匹配的概率越高。
Where represents the reprojection error calculated by and using . A smaller value of indicates a higher level of motion consistency between and the -th MHU, indicating a higher probability of being the correct match.
需要注意的是,理论上,局部运动一致性只表现在特征点及其邻域都是内点的情况下,所以如果有一个的值很小,我们就可以判断很可能是正确匹配。但是MHU会受到邻域中的异常值的污染,所以如果有一个的值很大,我们不能直接判断的正确性。为了避免受污染的MHU影响到正确的MHU的计算结果,我们应该避免将多个MHU的计算结果整合在一起。因此,我们选择只使用最小来表示最可能是正确匹配的概率。
It should be noted that, in theory, local motion consistency is only valid when the feature point and its neighborhood are all inliers. Therefore, if one value is very small, we can infer that is likely to be a correct match. However, the MHU may be contaminated by outliers in its neighborhood, so we cannot directly judge the correctness of if one value is very large. To avoid contaminated MHUs affecting the correct calculation of MHUs, we should avoid integrating the results of multiple MHUs. Therefore, we choose to use only the minimum to represent the probability that is the correct match.
成本函数(1)中的局部几何约束可以表示为:
The local geometric constraints in the cost function (1) can be expressed as:
3.4 跳出机制
因为局部几何约束函数每次都需要计算出最小的来作为它的值,这样耗时会很严重,所以在本节中我们设计了一种跳出机制来缩短运行时间并尽量不影响方法精度。
Since the local geometric constraint function needs to calculate the minimum value every time to serve as its output, this can result in significant computation time. Therefore, in this section, we designed a jump-out mechanism to shorten the processing time while minimizing the impact on method accuracy.
定义一个二元变量来表示匹配关系,表示是内点,表示是异常值。将二元变量代入成本函数中,则成本函数(1)变为:
We define a binary variable to represent the matching relationship, where indicates that is an inlier, and indicates that is an outlier. By substituting the binary variable into the cost function (2) becomes:
然后通过合并与相关项来改写上式:
We can rewrite the equation by combining terms that are related to as follows:
式中第二项是常数。为了最小化成本函数,第一项中应该尽可能多的保留负项并去除正项。于是被定义为:
The second term in the equation is a constant. To minimize the cost function, we should retain as many negative terms as possible and eliminate positive terms in the first term. Therefore, we define as:
根据局部运动一致性,只有使用全是内点的MHU对匹配进行判断,其判断结果才是可靠的。我们将这种MHU称为可靠的MHU。可靠的MHU对于一个匹配来说可能存在的个数是0到个,这取决于邻域的内点率。当邻域的内点数超过4个时,可靠的MHU的个数会超过1个并且会随机分布在个MHU中。也就是说在这种情况下如果匹配是正确的,那么匹配的重投影误差足够小的个数可能有多个。而我们只需要使用其中一个重投影误差就可以对匹配的正确性进行判断。因此,我们设定了一个阈值,当存在使得时,第个MHU是可靠的且匹配与该MHU具有局部运动一致性,即是正确的匹配。于是被重新定义为:
Based on local motion consistency, only an MHU that consists entirely of inliers can be used to assess the match, and its assessment result will be reliable. We refer to this kind of MHU as a reliable MHU. The number of reliable MHUs for a match may range from 0 to , depending on the inlier ratio in the neighborhood. When there are more than 4 inliers in the neighborhood, the number of reliable MHUs will exceed 1 and will be randomly distributed among the MHUs. In other words, if a match is correct in this case, there may be multiple reliable MHUs with sufficiently small reprojection errors. Therefore, we set a threshold , and if there exists an such that , then the th MHU is reliable, and the match is locally motion consistent with this MHU, indicating that it is a correct match. As a result, we redefine as:
因此最优内点集可以表示为:
Therefore, the optimal inlier set can be represented as:
只要存在满足式(13)的条件,方法就可以停止对的计算并跳出循环进行下一个的计算。可以看出,理论上,只有当匹配是正确的时候才可能跳出循环。所以随着假定匹配集合的内点率下降,该机制可以缩短的时间会减小。当的取值与相同时,该机制可以在几乎不影响方法精度的情况下显著的减低方法运行时间。该机制的可行性将在消融实验部分进行证明。
If there exists an that satisfies the condition of equation (13), the method can stop calculating and early jump out of the current loop to calculate the next . It can be seen that theoretically, the loop can only be prematurely exited when the match is correct. Therefore, as the inlier rate of the putative matching set decreases, the ability of this mechanism to shorten the running time will diminish. When the value of is the same as , the mechanism can significantly reduce the method’s running time with almost no impact on its accuracy. The feasibility of this mechanism will be demonstrated in the ablation study section.
3.5 时间复杂度
1 |
|
因为本文提出的方法主要是定义了一个基于单应性矩阵表示局部运动一致性的几何约束,所以我们将提出的方法简写为LMC,并将方法总结在alg1中。给定具有个匹配点对的假定匹配集,使用RANSAC构建可靠邻域点集的时间复杂度为,为RANSAC的迭代次数。使用K-D树搜索中每个特征点的近邻的时间复杂度接近。计算邻域单应性重投影误差的时间复杂度最坏为,由式(5)计算得到。提出LMC总的时间复杂度最坏为。在严格控制迭代次数和邻域点数大小的情况下,因为、和时常数且小于,所以总的时间复杂度可以简写为。提出的方法具有线性时间的复杂度,因此适用于处理现实世界的任务。
需要注意的是,使用描述符方法(如SIFT)得到的假定匹配集中可能存在一对多或多对一的情况(即 but , or but )。因此,我们在计算重投影误差后判断邻域点是否重复,如果重复则舍弃这个计算结果。
As the method proposed in this paper mainly defines a local geometric constraint based on homography matrix to represent local motion consistency, we have abbreviated the proposed method as LMC and summarize it in Alg1. Given a putative matches set with point pairs, the time complexity for building a reliable neighborhood set using RANSAC is , where is the number of iterations for RANSAC. Using a K-D tree to search for the nearest neighbors of each feature point in has a time complexity close to . The time complexity for computing the reprojection error of neighborhood homography is at most , where is calculated from equation (5). The total time complexity for the proposed LMC is at most . When the iteration count and neighborhood size are strictly controlled, the total time complexity can be simplified to , as , , and are constants and are smaller than . The proposed method has linear time complexity and is therefore suitable for handling real-world tasks.
It should be noted that there may be one-to-many or many-to-one situations (i.e., but , or but ) in the putative matches set obtained using descriptor-based methods (such as SIFT). Therefore, after computing the reprojection error, we check whether the neighboring points are duplicates. If duplicates are found, the calculation result is discarded.
3. 实验结果
在本节中,我们将提出的方法LMC与现有的几种先进的方法进行比较,包括LPM[18]、RANSAC[14]、mTopKRP[39]、NMRC[41]和LSCC[38]。评价性能的指标有Recall、Precision、F-score和Run time。性能指标定义如下:
In this section, we compare our proposed method LMC with several existing advanced methods, including LPM [18], RANSAC [14], mTopKRP [39], NMRC [41], and LSCC [38]. The performance will be evaluated based on the following metrics: Recall, Precision, F-score, and Runtime. The performance metrics are defined as follows:
其中true positive (TP)表示被判断为内点的内点,false positive(FP)表示被判断为内点的异常值,false negative (FN)表示被判断为异常值的内点,true negative (TN)表示被判断为异常值的异常值。Recall表示方法判断为内点中实际是内点的数量在整个样本内点中的比率,Precision表示方法判断为内点中真实内点的比率,F-score是Recall和Precision的调和均值。通过观察F-score可以综合评价方法的匹配精度。
where true positive (TP) represents the inliers that are correctly identified as inliers, false positive (FP) represents the outliers that are incorrectly identified as inliers, false negative (FN) represents the inliers that are incorrectly identified as outliers, and true negative (TN) represents the outliers that are correctly identified as outliers. Recall represents the ratio of correctly identified inliers to the total number of inliers in the sample. Precision represents the ratio of correctly identified inliers to the total number of identified inliers. F-score is the harmonic mean of Recall and Precision. By observing the F-score, the matching accuracy of the method can be comprehensively evaluated.
在用于比较的方法中,LPM通过其Python源代码实现(https://github.com/jiayi-ma/LPM_Python)。RANSAC通过Opencv4.6.0中的findHomography()函数实现。在Opencv4.6.0 findHomography()函数中,RANSAC会根据当前数据的分布情况动态调整最小子集的大小以及自适应的调整阈值,并且在计算误差时使用了一种称为Sampson距离的方法,这种方法能够处理一些不同于高斯分布的噪声。同时RANSAC还经过了GPU加速处理。实际上,通过findHomography()函数实现的RANSAC经过了许多的改进,与初始版本的RANSAC[14]有很大的不同,但是为了后续更好的称呼这种方法,我们仍然将其称为RANSAC。对于mTopKRP,本文将它的Matlab源代码(https://github.com/StaRainJ/mTopKRP)改写为Python源代码并实现。NMRC和LSCC的Python源代码根据论文自行复现,参数为默认值。所有实验均在具有2.90GHz Intel® Core™ i7-10700 CPU、16GB内存、Python 3.9、Opencv 4.6.0和PyCharm 2021.3.2 (Community Edition)的台式机上进行。
For the compared methods, LPM is implemented through its Python source code (https://github.com/jiayi-ma/LPM_Python). RANSAC is implemented through the findHomography() function in Opencv4.6.0. In the findHomography() function, RANSAC dynamically adjusts the size of the minimum subset and adapts the threshold according to the distribution of the current data. Additionally, RANSAC uses a method called Sampson distance to calculate errors, which can handle some noise distributions different from Gaussian. RANSAC is also GPU-accelerated. In fact, RANSAC implemented through the findHomography() function has undergone many improvements and differs significantly from the initial version of RANSAC[14]. However, for better naming consistency in subsequent discussions, we still refer to it as RANSAC. For mTopKRP, we modified its Matlab source code (https://github.com/StaRainJ/mTopKRP) into Python source code and implemented it. The Python source code for NMRC and LSCC were self-reproduced based on their respective papers. The default parameters were used in both methods for the experiments. All experiments were conducted on a desktop computer with an Intel® Core™ i7-10700 CPU with a clock speed of 2.90GHz, 16GB of RAM, Python 3.9, Opencv 4.6.0, and PyCharm 2021.3.2 (Community Edition).
3.1 数据集
为了评估方法的匹配性能,我们选择了五个数据集SUIRD(https://github.com/yyangynu/SUIRD)、HPatches[44] (https://github.com/hpatches/hpatches-dataset)、RS[1](https://github.com/StaRainJ/Image_matching_Datasets)、DTU[45]和Retina[46]:
To evaluate the matching performance of the proposed method, we selected five datasets: SUIRD (https://github.com/yyangynu/SUIRD), HPatches[44=43] (https://github.com/hpatches/hpatches-dataset), RS[1] (https://github.com/StaRainJ/Image_matching_Datasets), DTU[45=44], and Retina[46=45]:
-
SUIRD:该数据集包含60对由小型无人机拍摄的低空遥感图像。场景包含自然风景以及城市街道。图像对间的几何变换是由无人机拍摄时的视点变换引起的,存在低重合率、图像非刚性变形和极端视角变化等问题。数据集根据无人机的视点变换类型分为了水平旋转(称为Horizontal)、竖直旋转(称为Vertical)、缩放(称为Scaling)、混合视点变化(称为Mixture)和极端视点变换(称为Extreme)五组。为了更好的展示实验数据,我们将数据集中的水平旋转、竖直旋转和缩放整合为简单旋转和缩放(称为R&S)。
SUIRD: This dataset contains 60 pairs of low-altitude remote sensing images captured by small drones, covering natural landscapes and urban streets. The geometric transformations between image pairs are caused by viewpoint changes during drone shooting, resulting in low overlap rate, non-rigid image deformation, and extreme viewpoint changes. The dataset is divided into five groups according to the types of viewpoint changes of the drones: horizontal rotation (termed Horizontal), vertical rotation (termed Vertical), scaling (termed Scaling), mixed viewpoint changes (termed Mixture), and extreme viewpoint changes (termed Extreme). To better demonstrate the experimental data, we integrated horizontal rotation, vertical rotation, and scaling into simple rotation and scaling (termed R&S) in the dataset.
-
HPatches:该数据集的图像选自VGG、DTU、AMOS等多个数据集,用于评估匹配方法对常见图像的匹配性能。数据集包含116组图像序列,每组图像序列有6张图像。在图像序列中,是用第一张图像与剩下五张图像进行匹配。图像涉及旋转、缩放、极端视点变化、光线变化、图像压缩等。数据集根据图像变化类型分为了视点变化(称为HPatches-v)和光照变化(称为HPatches-i)两组。
HPatches: This dataset consists of images selected from multiple datasets such as VGG, DTU, and AMOS, and is used to evaluate the matching performance of matching methods on common images. The dataset contains 116 sets of image sequences, with 6 images in each set. In each image sequence, the first image is matched with the remaining five images. The images involve transformations in rotation, scaling, extreme viewpoint changes, lighting changes, and image compression. The dataset is divided into two groups based on the types of image transformations, viewpoint changes (termed HPatches-v) and lighting changes (termed HPatches-i).
-
RS:该数据集由四个不同变换类型的遥感图像数据集组成,分别为SAR、CIAP、UAV和PAN。SAR的图像对分别由合成孔径雷达和无人机捕获,共18对。CIAP的图像经过校正,但重叠区域很小,共57对。UAV的图像由无人机捕获,存在投影失真,共35对。PAN的图像是在不同时间捕获的,具有投影失真,共18对。
RS: This dataset consists of four different types of remote sensing image datasets, including SAR, CIAP, UAV, and PAN. The image pairs of SAR were captured by synthetic aperture radar and drones, totaling 18 pairs. The images of CIAP were corrected, but with a small overlap area, totaling 57 pairs. The images of UAV were captured by drones with projection distortion, totaling 35 pairs. The images of PAN were captured at different times, with projection distortion, totaling 18 pairs.
-
DTU:该数据集由131对存在大视角变换的图像对组成。由于存在大视角变换,图像对间存在复杂的非刚性变形。
DTU: The dataset consists of 131 pairs of images with large viewpoint changes. Due to the presence of large viewpoint changes, there are complex non-rigid deformations between image pairs.
-
Retinal:该数据集由70对多模态医学视网膜图像组成。图像对间存在非刚性变形。
Retinal: This dataset consists of 70 pairs of multimodal medical retinal images, which exhibit non-rigid deformations between the image pairs.
SUIRD是低空遥感图像,而DTU拍摄的距离相对于SUIRD更近,因此,尽管两种图像都存在大角度视角变换,但DTU图像中的非刚性变换程度要比SUIRD图像更大。HPatches的主要匹配对象是平面图像,但该数据集中存在内点率极低的图像对。RS是遥感图像的混合数据集,可以更全面地衡量方法在遥感图像匹配方面的性能。Retina可以进一步衡量特征匹配方法在更复杂的非刚性变换图像上的匹配性能。
SUIRD consists of low-altitude remote sensing images, while DTU was captured from a closer distance, resulting in a higher degree of non-rigid transformation compared to SUIRD despite both datasets having large-angle viewpoint changes. The main matching objects in HPatches are planar images, but there are image pairs with extremely low inlier rates in the dataset. RS is a mixed dataset of remote sensing images that can provide a more comprehensive evaluation of the performance of matching methods for remote sensing images. Retina can further evaluate the matching performance of methods on more complex non-rigid transformation images.
这些数据集的平均内点率不超过60%,这对错误匹配去除方法是具有挑战性的。SUIRD、RS、DTU和Retina的ground truth由手工标定。HPatches的ground truth由单应性矩阵表示,误差阈值设为5个像素,假定匹配集由SIFT建立。
These datasets have an average inlier rate of no more than 60%, which poses a challenge for mismatch removal methods. The ground truth for SUIRD, RS, DTU, and Retina was manually annotated.The ground truth for HPatches was represented by homography matrices, with an error threshold set to 5 pixels and the putative matches set was established by SIFT.
3.2 参数设置与消融实验
本文提出的LMC的初始参数包含邻域个数、局部运动一致性阈值、RANSAC的误差阈值。为了测试初始参数对LMC性能的影响,我们选择使用SUIRD数据集作为测试集。图3是LMC的Recall®、Precision§、F-score(F)和Runtime随初始参数变化的曲线图。
The initial parameters of LMC proposed in this paper include the number of neighbors , the threshold for local motion consistency, and the error threshold for RANSAC. To test the impact of initial parameters on the performance of LMC, we choose the SUIRD dataset as the test set. Figure 3 shows the curves of Recall ®, Precision §, F-score (F), and Runtime of LMC as the initial parameters change.
图3 算法性能在不同参数设置下的变化曲线图以及消融实验
对于值,理论上值不能太大也不能太小。根据运动一致性,内点通常只与相邻的内点具有运动一致性,而对于更远的内点只是具有相似的运动趋势。如果值太大,说明特征点的邻域很大,从而使得对局部运动一致性的判断的可靠性降低,同时因为邻域点增多导致计算时间显著增高。同样,如果值太小,则邻域可能没有足够的内点来构建正确的MHU,从而导致对局部运动一致性的判断的可靠性降低。
For the parameter , theoretically, it should not be too large or too small. Based on the motion consistency, inliers typically only have motion consistency with their neighboring inliers, while for farther inliers, they only have similar motion trends. If the value is too large, it means that the neighborhood of the feature points is large, which reduces the reliability of judging local motion consistency, and the calculation time increases significantly due to the increase in neighboring points. Similarly, if the value is too small, there may not be enough inliers in the neighborhood to construct the correct MHU, leading to a reduction in the reliability of judging local motion consistency.
对于值,需要注意的是LMC匹配精度指标Recall、Precision、F-score仅在小数点后第二位变化,这说明值对LMC匹配精度影响轻微并且可以区分大多数内点与异常值。单应性矩阵仅能表示局部简单的几何变换。对于更复杂的局部几何变换,计算结果存在误差是不可避免的。而精度指标出现变化的原因之一就是存在少量异常值的位置与特征点实际对应的位置靠的非常近(几个像素值),这使得LMC在一定误差阈值内不能很好的区分这类异常值,当然不是所有这样的异常值都无法区分,这取决于异常值的随机性和邻域的几何变换类型。因此如果一味的降低值会在去除这些异常值的同时去除部分内点,而且会计算更多的排列组合消耗更多的时间。
As for the value, it should be noted that the LMC matching accuracy indicators Recall, Precision, and F-score only vary in the second decimal place, which indicates that the influence of value on LMC matching accuracy is slight and can distinguish most inliers from outliers. Homography matrix can only represent simple local geometric transformations. For more complex local geometric transformations, errors in the calculation results are inevitable. One of the reasons for the change in accuracy indicators is that there are a few outlier positions that are very close to the actual corresponding positions of feature points (several pixel values), which makes LMC unable to distinguish these outliers well within a certain error threshold. Of course, not all such outliers cannot be distinguished, which depends on the randomness of outliers and the type of geometric transformation of the neighborhood. Therefore, blindly reducing the value will remove some inliers while removing these outliers, and it will also calculate more permutations and combinations, consuming more time.
对于值,需要注意的是LMC性能指标仅在小数点后第三位变化,所以尽管看起来曲线波动大,但实际上值对LMC的性能影响非常轻微。指标出现波动是因为RNASAC是重采样方法具有一定的随机性并且邻域点集的变化对LMC性能有轻微影响。运行时间的变化是因为值会改变RANSAC的运行时间进而改变整体LMC的运行时间。
As for the value of , it should be noted that LMC performance indicators only change in the third decimal place, so although the curve seems to fluctuate greatly, the impact of on LMC performance is very slight. The fluctuation of indicators is because RANSAC is a resampling method with certain randomness, and the change of the neighborhood point set has a slight impact on LMC performance. The change in runtime is because the value of will change the runtime of RANSAC and then change the overall runtime of LMC.
从上述分析可以看出,参数和通常是固定不变的,仅通过改变值来权衡Recall、Precision和Runtime。为了体现LMC在数据集中的最好性能,本文将初始参数值设定为、、。
As seen from the above analysis, parameters and are usually fixed and only value is adjusted to balance Recall, Precision, and Runtime. In order to demonstrate the best performance of LMC in the dataset, we set the initial parameter values to , , and .
我们将LMC分为三个模块,分别是使用RANSAC构建可靠邻域集、使用单应性矩阵计算重投影误差、使用跳出机制缩短时间。为了确认这些模块的有效性,我们设计了4个版本(Model_2、Model_12、Model_23、Model_123)的LMC来进行消融实验。消融实验如图3所示 。
We divide LMC into three modules: building a reliable neighborhood set using RANSAC, computing reprojection error using homography matrix, and using a jump-out mechanism to shorten the runtime. To confirm the effectiveness of these modules, we designed four versions of LMC (Model_2, Model_12, Model_23, Model_123) for ablation experiments. The results of these experiments are shown in Figure 3.
其中,各版本的LMC表示的含义如下所示:
The meaning of each version of LMC is as follows:
-
Model_2:既没有使用RANSAC,也没有使用跳出机制。
Model_2: Neither RANSAC nor jump-out mechanism were used.
-
Model_12:使用了RANSAC,但没有使用跳出机制。
Model_12: Used RANSAC but did not use a jump-out mechanism.
-
Model_23:使用了跳出机制,但没有使用RANSAC。
Model_23: Used a jump-out mechanism, but did not use RANSAC.
-
Model_123:既使用了RANSAC,也使用了跳出机制。
Model_123: Both RANSAC and jump-out mechanism were used.
在本实验中,令。从图3中的Ablation study可以看出,构建可靠邻域集可以有效的提高LMC的匹配精度,跳出机制可以有效的降低LMC的运行时间。
In this experiment, we set . From the Ablation study in Figure 3, it can be seen that constructing a reliable neighborhood set can effectively improve the matching accuracy of LMC, while the jump-out mechanism can effectively reduce the runtime of LMC.
3.3 定性分析
为了直观了解LMC的匹配效果,我们从SUIRD、HPatches、RS、DTU和Retina数据集中选择了12对具有代表性的图片进行匹配效果的展示,如图4所示。可以看出LMC对于各种类型的图像变换都可以去除大多数错误匹配。图4中所示的匹配结果是令人满意的,但由于以下四个原因,仍然存在少量不正确的匹配,我们分析如下:第一,如上文所说,存在异常值的位置与特征点实际对应的位置可能非常接近,以至于LMC无法区分;第二,特征点有时与邻域点围成的四边形距离很远,以至于严格来说该四边形并不能称为该特征点的邻域,因此导致LMC判断错误;第三,存在特征点是异常值,邻域点里有异常值,但特征点根据这些邻域点计算出的重投影误差较小的情况,这是因为异常值的随机性导致的小概率事件;第四,特征点可能与某个邻域点的距离非常近甚至重合,使得重投影误差非常小,使得LMC误判。这几种情况可能会混合出现。尽管如此,我们提出的LMC在SUIRD、HPatches、RS、DTU和Retina数据集中表现出的匹配结果是非常出色的,实验结果表明LMC可以胜任各种变化类型的图像特征匹配。
To intuitively understand the matching performance of LMC, we selected 12 representative image pairs from the SUIRD, HPatches, RS, DTU, and Retina datasets to show the matching results, as shown in Figure 4. It can be seen that LMC can remove most of the false matches for various types of image transformations. The matching results shown in Figure 4 are satisfactory, but there are still a small number of false matches due to four reasons, which we analyze as follows: Firstly, as mentioned earlier, the position of outliers may be very close to the actual corresponding position of the feature points, making it difficult for LMC to distinguish between them. Secondly, the distance between a feature point and the quadrilateral formed by its neighboring points may be so far that strictly speaking, the quadrilateral cannot be considered as the neighborhood of the feature point, leading to LMC misjudgment. Thirdly, there may be cases where a feature point is an outlier and there are outliers in its neighborhood, but the feature point has a small reprojection error calculated based on these neighborhood points, which is a rare event caused by the randomness of the outliers. Fourthly, a feature point may be very close to or even coincide with a certain neighboring point, making the reprojection error very small and causing LMC misjudgment. These cases may occur in combination. Nevertheless, the LMC proposed by us shows excellent matching results in the SUIRD, HPatches, RS, DTU, and Retina datasets, and the experimental results show that LMC can handle various types of image feature matching.
图4 定性分析
LMC在12对图像中的匹配效果展示。从上到下,第一列为SUIRD(Horizontal)、SUIRD(Vertical)、SUIRD(Scaling)、HParches-i,第二列为SUIRD(Extreme)、SUIRD(Mixture)、DTU、RS(UAV),第三列为RS(SAR)、RS(CIAP)、Retina、RS(PAN)。匹配效果展示分为图像对匹配直观效果和匹配向量的运动场,蓝色表示真阳性,红色表示假阳性,绿色表示假阴性,黑色表示真阴性,黄色表示真阳性匹配点与其邻域点的连线,紫色表示假阳性匹配点与其邻域点的连线。为了展示效果,仅显示一百个随机匹配项,并且不在图像对中显示真阴性匹配。
LMC Matching Results on 12 Image Pairs. From top to bottom, the first column shows SUIRD (Horizontal), SUIRD (Vertical), SUIRD (Scaling), and HPatches-i. The second column shows SUIRD (Extreme), SUIRD (Mixture), DTU, and RS (UAV). The third column shows RS (SAR), RS (CIAP), Retina, and RS (PAN). The matching results are presented in the form of image pair matching visualization and motion field of match vectors. True positives are shown in blue, false positives in red, false negatives in green, true negatives in black.The yellow lines indicate the connections between true positive feature points and their neighboring points.The purple lines indicate the connections between false positive feature points and their neighboring points.To demonstrate the results, only 100 randomly selected matches are displayed, and true negative matches are not shown in the image pairs.
3.4 定量分析
为了评估LMC方法的匹配性能,我们将SUIRD数据集分为Extreme、Mixture和R&S三个子集,并与先进的五种特征匹配方法(LPM[18]、RANSAC[14]、mTopKRP[39]、NMRC[41]和LSCC[38])进行定量对比。我们在四个性能指标(Recall、Precision、F-score和Runtime)上对这六种方法进行了比较,结果如图5所示。图5中从上到下依次是在Extreme、Mixture和R&S数据集中表现出的性能,从左到右依次是数据集平均内点率、Recall、Precision、F-score和Runtime。
To evaluate the matching performance of the LMC method, we divided the SUIRD dataset into three subsets, Extreme, Mixture, and R&S, and compared it quantitatively with five advanced feature matching methods (LPM[18], RANSAC[14], mTopKRP[39], NMRC[41], and LSCC[38]). We compared these six methods in four performance metrics (Recall, Precision, F-score, and Runtime), as shown in Figure 5. From top to bottom in Figure 5 are the performances exhibited in the Extreme, Mixture, and R&S datasets, while from left to right are the dataset average inlier ratio, Recall, Precision, F-score, and Runtime.
从图5可以看出,在三个数据集中,LMC方法的Recall和F-score在三个数据集中均排名第一,而Precision比RANSAC略低,排名第二,运行时间排名第三。综合来看,LMC的匹配性能优于其他五种先进方法。RANSAC的Precision和运行时间中排名第一,说明RANSAC可以在SUIRD数据集中最大程度地保证输出结果的内点率。然而,RANSAC是基于全局几何变换模型的,尽管SUIRD中的图像非刚性变换程度较小,但也会有部分内点是不满足全局几何变换模型的运动趋势。因此,RANSAC在保证Precision的同时,总是会漏掉一些内点,所以Recall排名第三。
From Figure 5, it can be seen that the Recall and F-score of the LMC method rank first among all six methods on all three datasets, while its Precision ranks second, slightly lower than that of RANSAC. The running time of LMC ranks third. Overall, LMC outperforms the other five advanced methods in terms of matching performance. RANSAC ranks first in Precision and running time, indicating that RANSAC can maximize the inlier ratio of the output results on the SUIRD dataset. However, RANSAC is based on a global geometric transformation model. Although the non-rigid transformations in the SUIRD images are small, there may still be some inliers that do not conform to the global geometric transformation model. Therefore, while ensuring Precision, RANSAC always misses some inliers, leading to its third ranking in Recall.
LSCC、LPM、mTopKRP、NMRC和LMC都属于基于局部几何约束的匹配方法。LSCC方法的Precision排名第四,而Recall排名最低,这表明基于皮尔逊相关系数的方法不能很好的表现邻域点和特征点的相关性。LPM的综合性能低于RANSAC,这是因为LPM更擅长处理非刚性变换的场景,但是SUIRD数据集中的局部失真并不明显,LPM没有发挥出它的优势。mTopKRP和LPM的匹配精度相似,说明它们的约束能力相似,但LPM的运行时间显著低于mTopKRP。LSCC、mTopKRP和NMRC都不是基于几何变换模型的方法,而是从别的数学方法入手挖掘邻域点和特征点的潜在关系。显然,NMRC的基于邻域流形的思想更能表现邻域点和特征点的关系,其约束能力最接近LMC,但仍然比LMC略低。我们认为基于单应性矩阵的局部几何约束因为其射影不变性而具有更严格的约束能力。综上,LMC方法可以有效权衡Recall和Precision,并且具有可接受的运行时间,在三种不同视点变化类型的数据集中均表现出比其他五种先进方法更优秀的性能。
LSCC, LPM, mTopKRP, NMRC, and LMC are all based on local geometric constraints. LSCC’s Precision ranks fourth, while its Recall is the lowest, indicating that Pearson correlation-based methods cannot effectively capture the correlation between neighboring and feature points. LPM’s overall performance is lower than RANSAC’s because LPM is better suited for handling non-rigid transformation scenarios, but the local distortion in the SUIRD dataset is not significant, and LPM did not showcase its advantages. The matching accuracy of mTopKRP and LPM is similar, indicating that their constraint abilities are similar, but LPM’s runtime is significantly lower than mTopKRP. LSCC, mTopKRP, and NMRC are not based on geometric transformation models but instead exploit the potential relationship between neighboring and feature points through other mathematical methods. Clearly, NMRC’s neighborhood manifold-based approach better captures the relationship between neighboring and feature points, with constraint ability closest to LMC, but still slightly lower. We believe that local geometric constraints based on homography matrices have stricter constraint abilities due to their projective invariance. Overall, LMC method effectively balances Recall and Precision and has an acceptable runtime, demonstrating superior performance over other five advanced methods in three different viewpoint change types of datasets.
图5 定量分析
LPM、mTopKRP、RANSAC、NMRC、LSCC和LMC六种方法在三个数据集Extreme、Mixture和R&S中的匹配性能定量比较。从上到下依次是在Extreme、Mixture和R&S中的实验数据,从左到右依次是数据集平均内点率、Recall、Precision、F-score和Run time。图中每条曲线均表示累计分布,例如,Recall图像中LMC曲线上的一点表示该数据集中LMC的Recall小于的图像对比例为。所有图例中的数据均表示平均值,红色排名第一,蓝色排名第二,绿色排名第三。
The matching performance of six methods, LPM, mTopKRP, RANSAC, NMRC, LSCC, and LMC, is quantitatively compared on the Extreme, Mixture, and R&S subsets of the SUIRD dataset. The experimental data in Extreme, Mixture, and R&S are shown from top to bottom, and the data set’s average inlier rate, Recall, Precision, F-score, and Run time are shown from left to right. Each curve in the figure represents a cumulative distribution. For example, a point on the LMC curve in the Recall plot indicates that the proportion of image pairs with Recall less than for LMC in that dataset is . All data in the legends are average values, with red indicating the top-ranked, blue indicating second-ranked, and green indicating third-ranked.
3.5 稳健性分析
为了研究LMC在不同内点率图像对中的匹配性能,我们处理了HPatches-i数据集的每组图像序列,并比较了LMC与其他五种先进方法在HPatches-i数据集中的匹配性能。我们按照变形程度对每组图像序列进行排序,如图6所示。我们将每行的第一个图像与其余五张图像进行匹配,共有五组图像对,从左到右变形程度逐渐增加,假设匹配集中的内点率减少。我们将每列图像对分为一组,共五组。图7使用箱型图展示了这五组的内点率和六种方法在这五组数据集中的F-score。图中出现内点率和F-score接近0的离群点的原因是HPatches中存在少量图像序列中所有图像对的内点率都接近于0,使得方法的性能急剧下降成为离群点。从图中可以看出,随着内点率的降低,所有方法的匹配性能都出现了下降,但与其他五种方法相比,LMC的性能降低得更为缓慢。因此,LMC对内点率变化的稳定性要优于其他五种方法。
To investigate the matching performance of LMC in image pairs with different inlier ratios, we processed each sequence of image pairs in the HPatches-i dataset and compared the matching performance of LMC with five other state-of-the-art methods in the HPatches-i dataset. We sorted each group of image pairs by deformation level, as shown in Figure 6. We matched the first image in each row with the remaining five images, resulting in five sets of image pairs. The degree of deformation gradually increases from left to right, and the inlier ratio in the putative sets decreases. We divided each column of image pairs into a group, resulting in five groups. Figure 7 presents a box plot of the inlier ratio and F-score of these five groups and the six methods. The reason for the appearance of outliers with inlier ratios and F-scores close to 0 in the figure is that there are a few image sequences in HPatches in which the inlier ratio of all image pairs is close to 0, resulting in a sharp drop in method performance and becoming an outlier. As shown in the figure, as the inlier ratio decreases, the matching performance of all methods declines. However, compared with the other five methods, the performance of LMC declines more slowly. Therefore, the stability of LMC with respect to changes in the inlier ratio is superior to that of the other five methods.
图6 对HPatches-i中的两个图像序列根据变形程度进行排序的示意图
图7 不同方法对变形程度增加的稳健性分析
3.6 研究邻域构建方法对性能的影响
为了研究不同邻域构建方法对方法性能的影响,我们分别使用了RANSAC和LPM来构建邻域。为了区分这两种方法,我们根据邻域构建方法的不同,将两种方法命名为LMC_RANSAC和LMC_LPM。我们将这两种方法与另外五种先进的方法进行比较,在不同类型的数据集中,包括DTU、Retina、RS、HPatches、SUIRD。当某个方法无法处理某个图像对时,我们将其Recall、Precision和F-score设为0并将Runtime设为1000ms作为惩罚。实验结果如图8所示,图例中,红色排名第一,蓝色排名第二,绿色排名第三。图8中从上到下依次是在DTU、Retina、RS、HPatches、SUIRD数据集中表现出的性能,从左到右依次是数据集平均内点率、Recall、Precision、F-score和Runtime。
To investigate the impact of different neighborhood construction methods on the performance of the method, we used RANSAC and LPM to construct the neighborhood separately. To distinguish these two methods, we named them LMC_RANSAC and LMC_LPM, respectively. We compared these two methods with five other advanced methods on different types of datasets, including DTU, Retina, RS, HPatches, and SUIRD. When a method could not process a particular image pair, we set its Recall, Precision, and F-score to 0 and set the Runtime to 1000 milliseconds as a penalty. The experimental results are shown in Figure 8, where the legend indicates that red ranks first, blue ranks second, and green ranks third. From top to bottom in Figure 8 are the performances exhibited in the DTU, Retina, RS, HPatches, and SUIRD datasets, while from left to right are the dataset average inlier ratio, Recall, Precision, F-score, and Runtime.
通过图8可以发现,在可接受的运行时间内,LMC_LPM方法在DTU、Retina和RS数据集中表现出了最好的匹配性能,而LMC_RANSAC方法在HPatches和SUIRD数据集中表现出了最好的匹配性能。然而,由于HPatches数据集中的内点率较低,导致七种方法的匹配性能都有不同程度的下降。在第四行的数据中,尽管LMC_RANSAC方法无法处理的图像对最多,但对于可以处理的图像对,它仍表现出了更好的匹配性能。同时,LMC_RANSAC和LMC_LPM方法的运行时间在惩罚项的影响下被拉高了平均值,但对于可以处理的图像对,它们的运行时间仍然在100ms左右。
Through Fig. 8, it can be observed that within an acceptable runtime, the LMC_LPM method exhibits the best matching performance in DTU, Retina, and RS datasets, while the LMC_RANSAC method shows the best matching performance in HPatches and SUIRD datasets. However, due to the low inlier ratio in the HPatches dataset, the matching performance of all seven methods decreased to varying degrees. In the data in the fourth row, although LMC_RANSAC has the highest number of image pairs that it cannot handle, it still shows better matching performance for the pairs that it can handle. Meanwhile, the runtime of LMC_RANSAC and LMC_LPM methods is elevated by the penalty term, but their actual runtime for the image pairs they can handle is still around 100 milliseconds.
对于邻域构造方法,LPM更适合处理具有复杂成像情况的非刚性变换图像,例如多模态、噪声干扰、图像畸变(如广角)等。而RANSAC更适合处理比较简单的情况。然而,观察所有实验数据,在LMC_RANSAC和LMC_LPM的数据中出现惩罚项时,它们的邻域构造方法的Recall或Precision总是处于一个较低的值。与其他方法相比,LMC对邻域的内点率和假设匹配的个数有更严格的要求。如果邻域的内点率不足,会导致LMC的运行时间增加,甚至会降低局部几何约束对局部运动一致性判断的可靠性。如果假定的匹配个数不足,会导致邻域过大,从而降低方法的准确度。
For neighborhood construction methods, LPM is better suited for handling non-rigid transformation images with complex imaging conditions, such as multimodality, noise interference, and image distortion (such as wide-angle). RANSAC is more suitable for handling relatively simple cases. However, by observing all experimental data, when penalty terms appear in the data of LMC_RANSAC and LMC_LPM, their neighborhood construction methods’ Recall or Precision are always at a very low value. Compared with other methods, LMC has stricter requirements for the neighborhood’s inlier rate and the number of putative matches. If the neighborhood’s inlier rate is insufficient, it will cause LMC’s runtime to increase and even reduce the reliability of local geometric constraints in estimating local motion consistency. If the number of putative matches is insufficient, it will cause the neighborhood to be too large, thereby reducing the accuracy of the method.
总的来说,LMC_LPM适用于处理复杂成像情况的非刚性变换图像以及图像的变化类型未知的情况。而LMC_RANSAC适用于处理简单的图像变换。它们在各自适用的数据集中表现出了比目前五种先进方法更优异的匹配性能。
In general, experimental data indicates that LMC_LPM is suitable for handling non-rigid transformation images with complex imaging situations and unknown types of image changes. LMC_RANSAC, on the other hand, is suitable for handling relatively simple image transformations. Both algorithms have demonstrated superior matching performance in their respective applicable datasets compared to the current five advanced methods.
图8 定量分析
4.讨论
在3.3小节中,我们将LMC在多个数据集中的特征匹配效果进行了可视化,并根据可视化的结果讨论了LMC存在误差的原因,为后续改进LMC提供了思路。在3.4小节中,我们在多个视角变化的数据集上进行了实验,将LMC与其他五种先进方法进行对比,并对每个方法进行了简单的分析。在3.5小节中,我们进行了鲁棒性分析。实验数据表明LMC相较于其他的方法对于内点率的变化具有更好的鲁棒性,但是当内点率过低时,LMC仍然会失效。在3.6小节,我们讨论了邻域构建方法对LMC匹配性能的影响。实验结果表明LMC_LPM和LMC_RANSAC在各自适用的几何变换类型的数据集上表现出了最好的综合匹配性能,并且平均运行时间在100毫秒内。
相较于其他基于局部几何约束的方法,LMC使用了更严格的几何约束。这使得LMC可以处理具有更复杂的几何变换的情况。因此,LMC具有比RANSAC等方法更广泛的适用范围。除了在遥感图像领域具有很好的特征匹配效果外,我们提出的方法在其他的图像处理领域,如多模态医学图像的特征匹配,也展现出了不错的潜力。
但是,LMC对邻域的内点率也有严格的要求。假定匹配集合的内点率过低或者特征点过于稀疏都会导致邻域构建时出现异常,进而可能会降低LMC的匹配性能。此外,LMC的运行时间会随着邻域内点率的降低而升高。因为LMC的跳出机制认为邻域中可能存在多个可靠的MHU。当邻域内点率过低时,邻域中存在的可靠的MHU的数量也会降低,从而使跳出机制的性能退化甚至失效。
In Section 3.3, we visualized the feature matching performance of LMC in multiple datasets and discussed the reasons for the errors in LMC based on the visualization results, providing ideas for improving LMC. In section 3.4, we conducted experiments on multiple datasets with various viewpoints, comparing LMC with five other advanced methods and providing a brief analysis of each method. In Section 3.5, we conducted a robustness analysis, which showed that LMC has better robustness to changes in the inlier ratio than other methods, but still fails when the inlier ratio is too low. In Section 3.6, we discussed the impact of neighborhood construction methods on LMC’s matching performance. The experimental results showed that LMC_LPM and LMC_RANSAC performed the best comprehensive matching performance on their respective applicable geometric transformation type datasets, with an average runtime of less than 100 milliseconds.
Compared to other methods based on local geometric constraints, LMC uses more strict geometric constraints, which enables it to handle cases with more complex geometric transformations. Therefore, LMC has a wider range of applicability than methods such as RANSAC. In addition to its good feature matching performance in the remote sensing image field, our proposed method also shows promising potential in other image processing fields, such as feature matching in multimodal medical images.
However, LMC also has strict requirements for the inlier ratio of the neighborhood. If the feature points are too sparse, it may lead to abnormal neighborhood construction, which may in turn reduce the matching performance of LMC. In addition, the runtime of LMC increases as the inlier ratio of the neighborhood decreases. This is because the jump-out mechanism of LMC assumes that there may exist multiple reliable MHUs in the neighborhood. When the inlier ratio of the neighborhood is too low, the number of reliable MHUs in the neighborhood will also decrease, which may degrade or even invalidate the performance of the jump-out mechanism.
5. 结论
本文提出了一种名为LMC的错误匹配去除方法,其基于正确匹配相邻间具有相同运动的性质,并应用于遥感图像特征匹配。该方法利用单应性矩阵表示邻域几何变换,以计算特征点的重投影误差来衡量匹配的局部运动一致性。我们提出了一种跳出机制来显著降低运行时间。此外,我们使用RANSAC和LPM来构建邻域,并将提出的方法分为LMC_RANSAC和LMC_LPM。实验数据表明,LMC_RANSAC和LMC_LPM在各自适用的数据集中表现出比目前先进方法更优异的匹配性能。
This paper proposes an mismatch removal method called LMC, based on the property that adjacent correct matches have the same motion, and applies it to feature matching in remote sensing images. The method uses homography matrices to represent neighborhood geometric transformations and measures the local motion consistency of matches by calculating the reprojection error of feature points.
This paper proposes a new method for removing mismatches in feature matching of remote sensing images, called Local Motion Consistency (LMC). LMC is based on the property that adjacent correct matches have the same motion, and uses homography matrices to represent neighborhood geometric transformations. The method measures the local motion consistency of matches by calculating the reprojection error of feature points, which allows for the identification and removal of false matches.
We propose a jump-out mechanism to significantly reduce the runtime. Furthermore, we use RANSAC and LPM to construct neighborhoods, and divide the proposed method into LMC_RANSAC and LMC_LPM. Experimental data shows that LMC_RANSAC and LMC_LPM exhibit superior matching performance in their respective applicable datasets compared to current advanced methods.
需要注意的是,LMC非常依赖于邻域的初始化。当假定匹配非常稀疏或邻域内点率非常低时,LMC对局部运动一致性的衡量会变得不可靠。幸运的是,目前使用的邻域构造方法通常适用于大多数情况,但设计更快速且通用的邻域构造方法是可以研究的方向。另外,可以尝试使用能够表示更复杂变换的几何变换模型来表示局部运动一致性。
It should be noted that LMC heavily relies on the initialization of the neighborhood. When putative matches are very sparse or the inlier ratio within the neighborhood is very low, the measure of local motion consistency used by LMC can become unreliable. Fortunately, the neighborhood construction methods currently used are generally applicable to most cases, but it is possible to explore the direction of designing faster and more versatile neighborhood construction methods. Additionally, it is worth trying to use geometric transformation models that can represent more complex transformations to represent local motion consistency.
参考文献
[1] Ma J, Jiang X, Fan A, et al. Image matching from handcrafted to deep features: A survey[J]. International Journal of Computer Vision, 2021, 129(1): 23-79.
[2] Li X, Luo X, Wu Y, et al. Research on stereo matching for satellite generalized image pair based on improved SURF and RFM[C]//IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2020: 2739-2742.
[3] Jiang S, Jiang W, Wang L. Unmanned aerial vehicle-based photogrammetric 3d mapping: A survey of techniques, applications, and challenges[J]. IEEE Geoscience and Remote Sensing Magazine, 2021, 10(2): 135-171.
[4] Xiao G, Luo H, Zeng K, et al. Robust feature matching for remote sensing image registration via guided hyperplane fitting[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 60: 1-14.
[5] Quan D, Wang S, Gu Y, et al. Deep feature correlation learning for multi-modal remote sensing image registration[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-16.
[6] Ma J, Tang L, Fan F, et al. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(7): 1200-1217.
[7] Lin C C, Pankanti S U, Natesan Ramamurthy K, et al. Adaptive as-natural-as-possible image stitching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 1155-1163.
[8] Wei C, Xia H, Qiao Y. Fast unmanned aerial vehicle image matching combining geometric information and feature similarity[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(10): 1731-1735.
[9] Wang C, Wang L, Liu L. Progressive mode-seeking on graphs for sparse feature matching[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13. Springer International Publishing, 2014: 788-802.
[10] Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004, 60: 91-110.
[11] Bay H, Ess A, Tuytelaars T, et al. Speeded-up robust features (SURF)[J]. Computer vision and image understanding, 2008, 110(3): 346-359.
[12] Rublee E, Rabaud V, Konolige K, et al. ORB: An efficient alternative to SIFT or SURF[C]//2011 International conference on computer vision. Ieee, 2011: 2564-2571.
[13] Ammar Abbas S, Zisserman A. A geometric approach to obtain a bird’s eye view from an image[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 2019: 0-0.
[14] Fischler M A, Bolles R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981, 24(6): 381-395.
[15] Li X, Hu Z. Rejecting mismatches by correspondence function[J]. International Journal of Computer Vision, 2010, 89: 1-17.
[16] Myronenko A, Song X. Point set registration: Coherent point drift[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 32(12): 2262-2275.
[17] Sedaghat A, Mokhtarzade M, Ebadi H. Uniform robust scale-invariant feature matching for optical remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(11): 4516-4527.
[18] Ma J, Zhao J, Jiang J, et al. Locality preserving matching[J]. International Journal of Computer Vision, 2019, 127: 512-531.
[19] Torr P H S, Zisserman A. MLESAC: A new robust estimator with application to estimating image geometry[J]. Computer vision and image understanding, 2000, 78(1): 138-156.
[20] Chum O, Matas J. Matching with PROSAC-progressive sample consensus[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). IEEE, 2005, 1: 220-226.
[21] Barath D, Noskova J, Ivashechkin M, et al. MAGSAC++, a fast, reliable and accurate robust estimator[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1304-1312.
[22] Barath D, Matas J. Graph-cut RANSAC: Local optimization on spatially coherent structures[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 4961-4974.
[23] Zhao J, Ma J, Tian J, et al. A robust method for vector field learning with application to mismatch removing[C]//CVPR 2011. IEEE, 2011: 2977-2984.
[24] Liu H, Yan S. Common visual pattern discovery via spatially coherent correspondences[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010: 1609-1616.
[25] Zhou F, De la Torre F. Factorized graph matching[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 127-134.
[26] Liu Z Y, Qiao H. Gnccp—graduated nonconvexityand concavity procedure[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(6): 1258-1267.
[27] 27:Loiola E M, De Abreu N M M, Boaventura-Netto P O, et al. A survey for the quadratic assignment problem[J]. European journal of operational research, 2007, 176(2): 657-690.
[28] Jiang X, Xia Y, Zhang X P, et al. Robust image matching via local graph structure consensus[J]. Pattern Recognition, 2022, 126: 108588.
[29] Ma J, Fan A, Jiang X, et al. Feature matching via motion-consistency driven probabilistic graphical model[J]. International Journal of Computer Vision, 2022, 130(9): 2249-2264.
[30] Ma J, Jiang X, Jiang J, et al. LMR: Learning a two-class classifier for mismatch removal[J]. IEEE Transactions on Image Processing, 2019, 28(8): 4045-4059.
[31] Yi K M, Trulls E, Ono Y, et al. Learning to find good correspondences[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2666-2674.
[32] Sarlin P E, DeTone D, Malisiewicz T, et al. Superglue: Learning feature matching with graph neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 4938-4947.
[33] Jiang X, Wang Y, Fan A, et al. Learning for mismatch removal via graph attention networks[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 181-195.
[34] Chen J, Chen S, Chen X, et al. Csr-net: Learning adaptive context structure representation for robust feature correspondence[J]. IEEE Transactions on Image Processing, 2022, 31: 3197-3210.
[35] Jiang B, Sun P, Luo B. GLMNet: Graph learning-matching convolutional networks for feature matching[J]. Pattern Recognition, 2022, 121: 108167.
[36] Jiang Z, Rahmani H, Angelov P, et al. Graph-context Attention Networks for Size-varied Deep Graph Matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2343-2352.
[37] Bian J W, Lin W Y, Matsushita Y, et al. Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4181-4190.
[38] Shao F, Liu Z, An J. A discriminative point matching algorithm based on local structure consensus constraint[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 18(8): 1366-1370.
[39] Jiang X, Jiang J, Fan A, et al. Multiscale locality and rank preservation for robust feature matching of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(9): 6462-6472.
[40] Chen J, Yang M, Gong W, et al. Multi-neighborhood Guided Kendall Rank Correlation Coefficient for Feature Matching[J]. IEEE Transactions on Multimedia, 2022.
[41] Ma J, Li Z, Zhang K, et al. Robust feature matching via neighborhood manifold representation consensus[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 183: 196-209.
[42] Shen L, Xu Z, Zhu J, et al. A frame-based probabilistic local verification method for robust correspondence[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 192: 232-243.
[43] Ye X, Ma J, Xiong H. Local Affine Preservation With Motion Consistency for Feature Matching of Remote Sensing Images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-12.
[44] Balntas V, Lenc K, Vedaldi A, et al. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5173-5182.
[45] Aanæs H, Jensen R R, Vogiatzis G, et al. Large-scale data for multiple-view stereopsis[J]. International Journal of Computer Vision, 2016, 120: 153-168.
[46] Jiang X, Ma J, Xiao G, et al. A review of multimodal image matching: Methods and applications[J]. Information Fusion, 2021, 73: 22-71.