Fix Version/s: None
The QA scripts in pipe_analysis do a lot of column accessing & assigning of afw tables. A recent community post highlighted the inefficient nature of accessing catalog elements with catalog[fieldName] vs catalog.get(fieldName) (or, even better, catalog.get(filedNameKey)). Please update the code for speed efficiency according to the findings and recommendations of this community post.
Ah, right...and I did have an instance of row access where I'm now doing key prefetching.
Ok, I think this is finally good-to-go. It's a lot less churn than previous iterations, but I did find other time-wasting issues that I've fixed along the way, so this was still worthwhile (and a learning experience!) Can you give the official review?
A full visit now runs in ~20min (compared to 100min+ previously).
Okay, that's interesting on the follow-up speed-up of whichever you do next. So the system is working appropriately, and any of these methods is equally good, and behind-the-scenes no matter what you do the caching seems to work. So that's all excellent news: there shouldn't be any significant performance difference based on your exact choice of incantation. All good!
So I 100% agree that for simplicity catalog[fieldName] is the way to go. It's easy to read, pythonic (or at least numpythonic) and performs well.
As to your question on whether pre-fetching the key helps, I came up with the following toy example:
The point here is that we're doing row-wise and column-wise access of a long table, so you save a bunch of time if you prefetch the keys:
But if you're doing column-wise access of the whole table, it definitely won't make any difference.