Deep Hough Voting

End-to-end

Feed in point-cloud to Pointnet++ to generate our seeds
1. Pointnet++ subsamples by applying farthest-point sampling
2. Reduces number of points from $N$ to $M$
3. For each point will output the $x,y,z$ $x, y, z$ coordinates along with $C$ $C$ features in a $1\times C$ $1 \times C$ feature vector
  1. Input: $N\times3$
  2. Output: $M\times(3+C)$
Take seeds and generate VOTES
1. Each seed’s (i.e. $\{s_{i}\}_{i=1}^{M},s_{i}=[x_{i},f_{i}]$ ) features, $f_{i}$ get’s passed into an MLP that outputs the offset $\Delta x_{i}\in\mathbb{R}^3$ and a feature offset $\Delta f_{i}\in\mathbb{R}^{C}$ these are our “votes”.
2. The votes point the seed points in the direction of the “center” of their object
3. Loss is applied here for the spatial offset (i.e. $L_{1}$ loss between true $\Delta x_{i}$ and predicted one)
Vote Clustering
1. Here we subsample again, by taking $K$ votes from the $M$ ones provided using farthest point sampling. We then use these votes as cluster centroids and cluster our $M$ votes based on these.
2. We take the $K$ centroids and pass them through a shared PointNet to propose the bounding boxes along with classification of the class.