Gregory Dubois-Felsmann helped me understand this issue better today in Slack, I'm pasting most of his explanation here for future reference:
Basically a neighbor table is what you get when you already have spatial tables a and b, and you compute and store back to the database as a new table the result of
SELECT a.id, b.id FROM a, b WHERE scisql_angsep(a.ra,a.decl,b.ra.b.decl) < (something) AND (perhaps some other conditions, like quality flag cuts)
In general this produces an N:M relation.
In some cases there might be a small number of additional columns in the neighbor table reflecting some assessments of the likelihood that objects that are near each other are actually related, or different manifestations of the same astrophysical source.
So, in a Qserv situation, if a and b are sharded, then you have to figure out how to shard the neighbor table itself as well, and how to deal with overlaps. (E.g., do both a and b have overlaps included in their sharding scheme? Or only, say, a?)
We could test this on an all-sky basis by trying to develop a neighbor table connecting Gaia (gaia_source) and AllWISE (allwise_p3as_psd).
so the neighbor table is the result of the join of a and b?
Yes, that’s right.
You could write:
SELECT a.id, b.id FROM a INNER JOIN b ON scisql_angsep(a.ra,a.decl,b.ra.b.decl) < (something) WHERE (perhaps some other conditions, like quality flag cuts)
The output of that is the “neighbor table”.
The UK folks’ point is that, at least in some situations, using such a table is substantially faster than performing all the floating-point computations involved in the angular separation calculation.
The thing about Qserv is that when it sees the scisql_angsep() it knows that the join is spatially restricted and therefore how to assign it to the worker nodes.
If you now try to do a science-user query that’s based on the precomputed neighbor table (called n, columns a_id and b_id), though, it looks like this:
SELECT (stuff from a), (stuff from b) FROM a INNER JOIN n ON a.id = n.a_id INNER JOIN b on b.id = n.b_id WHERE (bla, bla, bla)
This is basically the same operation as
SELECT (stuff from a), (stuff from b) FROM a INNER JOIN b ON scisql_angsep(a.ra,a.decl,b.ra.b.decl) < (something) WHERE (bla, bla, bla)
but without the trigonometry.
The double-join with the n table, however, cannot prima facie be recognized by Qserv as partionable, because in theory any row in a could be related to any row in b.
So there has to be some logic in Qserv to say “I trust that, when n was ingested, it was validated as containing only relationships that don’t cross shards outside their overlap radius”, and then to apply the double join locally at shard level, and then to remove duplicates across shards successfully.
I don’t really know how hard that will be.
We do have feedback from the LSP review last April about the desire for neighbor tables (also known as “match tables” or “bridge tables”).