# Optimize QA code for speed in accessing/assigning afw table columns

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
2
• Epic Link:
• Sprint:
DRP F17-4
• Team:
Data Release Production

#### Description

The QA scripts in pipe_analysis do a lot of column accessing & assigning of afw tables. A recent community post highlighted the inefficient nature of accessing catalog elements with catalog[fieldName] vs catalog.get(fieldName) (or, even better, catalog.get(filedNameKey)). Please update the code for speed efficiency according to the findings and recommendations of this community post.

#### Activity

Hide
Eli Rykoff added a comment - - edited

Okay, that's interesting on the follow-up speed-up of whichever you do next. So the system is working appropriately, and any of these methods is equally good, and behind-the-scenes no matter what you do the caching seems to work. So that's all excellent news: there shouldn't be any significant performance difference based on your exact choice of incantation. All good!

So I 100% agree that for simplicity catalog[fieldName] is the way to go. It's easy to read, pythonic (or at least numpythonic) and performs well.

As to your question on whether pre-fetching the key helps, I came up with the following toy example:

 import lsst.afw.table as afwTable import numpy as np schema = afwTable.Schema() nCol = 10 for i in xrange(10):  schema.addField('col%05d' % (i), type=np.float32,doc='column')   testCat = afwTable.BaseCatalog(schema) nRow = 1000 testCat.table.preallocate(nRow) for i in xrange(nRow):  rec=testCat.addNew()  for j in xrange(nCol):  rec['col%05d' % (j)] = j   def test1():  for i in xrange(nRow):  for j in xrange(nCol):  test = testCat['col%05d' % (j)][i]   def test2():  keys = []  for j in xrange(nCol):  keys.append(testCat.schema['col%05d' % (j)].asKey())    for i in xrange(nRow):  for j in xrange(nCol):  test = testCat[keys[j]][i]  

The point here is that we're doing row-wise and column-wise access of a long table, so you save a bunch of time if you prefetch the keys:

 In [14]: %timeit test1() 1 loop, best of 3: 296 ms per loop   In [15]: %timeit test2() 10 loops, best of 3: 113 ms per loop 

But if you're doing column-wise access of the whole table, it definitely won't make any difference.

Show
Eli Rykoff added a comment - - edited Okay, that's interesting on the follow-up speed-up of whichever you do next. So the system is working appropriately, and any of these methods is equally good, and behind-the-scenes no matter what you do the caching seems to work. So that's all excellent news: there shouldn't be any significant performance difference based on your exact choice of incantation. All good! So I 100% agree that for simplicity catalog [fieldName] is the way to go. It's easy to read, pythonic (or at least numpythonic) and performs well. As to your question on whether pre-fetching the key helps, I came up with the following toy example: import lsst.afw.table as afwTable import numpy as np schema = afwTable.Schema() nCol = 10 for i in xrange ( 10 ): schema.addField( 'col%05d' % (i), type = np.float32,doc = 'column' )   testCat = afwTable.BaseCatalog(schema) nRow = 1000 testCat.table.preallocate(nRow) for i in xrange (nRow): rec = testCat.addNew() for j in xrange (nCol): rec[ 'col%05d' % (j)] = j   def test1(): for i in xrange (nRow): for j in xrange (nCol): test = testCat[ 'col%05d' % (j)][i]   def test2(): keys = [] for j in xrange (nCol): keys.append(testCat.schema[ 'col%05d' % (j)].asKey())   for i in xrange (nRow): for j in xrange (nCol): test = testCat[keys[j]][i] The point here is that we're doing row-wise and column-wise access of a long table, so you save a bunch of time if you prefetch the keys: In [ 14 ]: % timeit test1() 1 loop, best of 3 : 296 ms per loop   In [ 15 ]: % timeit test2() 10 loops, best of 3 : 113 ms per loop But if you're doing column-wise access of the whole table, it definitely won't make any difference.
Hide
Lauren MacArthur added a comment -

Ah, right...and I did have an instance of row access where I'm now doing key prefetching.

Ok, I think this is finally good-to-go. It's a lot less churn than previous iterations, but I did find other time-wasting issues that I've fixed along the way, so this was still worthwhile (and a learning experience!) Can you give the official review?

Show
Lauren MacArthur added a comment - Ah, right...and I did have an instance of row access where I'm now doing key prefetching. Ok, I think this is finally good-to-go. It's a lot less churn than previous iterations, but I did find other time-wasting issues that I've fixed along the way, so this was still worthwhile (and a learning experience!) Can you give the official review?
Hide
Eli Rykoff added a comment -

I have to go pick up the kids, but it's the top of my list.

Show
Eli Rykoff added a comment - I have to go pick up the kids, but it's the top of my list.
Hide
Lauren MacArthur added a comment -

Approved by Eli Rykoff on the PR

Show
Lauren MacArthur added a comment - Approved by Eli Rykoff on the PR
Hide
Lauren MacArthur added a comment -

A full visit now runs in ~20min (compared to 100min+ previously).

Show
Lauren MacArthur added a comment - A full visit now runs in ~20min (compared to 100min+ previously).

#### People

Assignee:
Lauren MacArthur
Reporter:
Lauren MacArthur
Reviewers:
Eli Rykoff
Watchers:
Eli Rykoff, John Swinbank, Lauren MacArthur
Votes:
0 Vote for this issue
Watchers:
3 Start watching this issue

#### Dates

Created:
Updated:
Resolved:

#### Jenkins

No builds found.