🌱
Pasted image 20231211111017.png Pasted image 20231211113619.png # End-to-end 1. Input: RGB-D Image, uses CNN to process image 2. Feature Extraction: 1. Run CNN over image to extract features. Here we output which is our predicted foreground object (binary semantic segmented output) 2. Also predict which is the unsegmented estimate of radii for each pixel in the image. 3. Distance Estimation 1. Here we generate predicted radii from each pixel in the ground-truth binary segmented outline, , of our previously predicted radii, to each key points in the form . 4. 3D Accumulator Space: 1. Recall from Hough Voting, that we have our accumulator space which is a discrete 2x2 matrix of cells where we increment the value of a cell for every line that passes through it (the line corresponds to the key points in our image). 2. We increment the accumulator space (i.e. the voxel space which is 3D here) at every cell that the surface of these spheres are contained within. 3. The highest cell values (highest intersection points) correspond to our key points we’re trying to predict. 5. Loss! 1. Compare predicted segmented with ground-truth and predicted keypoints with ground truth keypoints.