FRAMEWORK
MAINLY THREE PARTS(pic above):
- Interest Point Pre-Training
-
use synthetic dataset(easy to get corner pts, e.g. L/Y/T junctions...)
-
train base detector(what is the detector‘s arch?)

-
transfer to real img(next steps)
-
- Interest Point Self-Labeling(sample random homography,generate pseudo-GT)
- use base detector to inference(init interest pts in real img)
- use homographic adaption(detail?)
- Joint Training(interest pts and descriptor)
- loss
- network arch
METHOD DETAIL
NETWORK DESIGN

-
encoder-decoder architecture
- shared encoder(advantages?)
- two heads
-
Shared Encoder
-
VGG-styled
-
pixel cells
three 2×2 non-overlapping max pooling operations in the encoder result in 8 × 8 pixel cells
-
-
Interest Point Decoder
-
NO upsampling layers(high computation & unwanted checkerboard artifacts)
-
designed the interest point detection head(with an explicit decoder)
This decoder has no parameters, and is known as “sub-pixel convolution” or “depth to space” in TensorFlow or “pixel shuffle” in PyTorch
-
-
Descriptor Decoder
- similar to UCN(Universal Correspondence Network)
- perform bicubic interpolation of the descriptor and then L2-normalizes(fixed)
LOSS FUNC




HOMOGRAPHY

-
formulation

- : input image
- : resulting interest points
- : network
- : homography
-
improved super-point detector

-
choosing homographies

-
Iterative Homographic Adaptation

RESOURCE
rpautrat/SuperPoint: Efficient neural feature detector and descriptor (github.com)