🌱
Pasted image 20231211121444.png # End-to-end 1. Training process 1. First we feed in synthetic, labelled data into our network. 2. Then we fine-tune our network using combination of synthetic and real images (i.e. pure real images, and real images overlayed with synthetic objects). 2. Adapter Model (): 1. This is a series of CNNs 2. Here we encode pairs of feature maps from , these are denoted as and they’re transformed into high-dimensional feature maps 3. The model is trained on both synthetic data overlaid on real scenes and pure real data, allowing it to mix it’s understanding of both and finding common features in both. 3. Maximum Mean Discrepancy (MMD) in RKHS: 1. MMD is used to measure the distance in the Reproducing Kernel Hilbert Space (RKHS), which is crucial for the model’s self-supervised learning approach. 2. Distance Mapping: The intermediate features of (the model trained on real data) are densely mapped into high-dimensional spaces using convolutional blocks. 3. Loss Back-Propagation: The calculated distance, treated as , is then backpropagated through and 4. Vote Accumulation and KeyPoint Estimation: 1. The model takes input, accumulates votes, detects peaks, and estimates the final keypoints 2. Also provides classification labels and confidence scores.