# Optimize QA code for speed in accessing/assigning afw table columns

XMLWordPrintable

#### Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
2
• Sprint:
DRP F17-4
• Team:
Data Release Production

#### Description

The QA scripts in pipe_analysis do a lot of column accessing & assigning of afw tables. A recent community post highlighted the inefficient nature of accessing catalog elements with catalog[fieldName] vs catalog.get(fieldName) (or, even better, catalog.get(filedNameKey)). Please update the code for speed efficiency according to the findings and recommendations of this community post.

#### Activity

Hide
Eli Rykoff added a comment -

With the quick fix from Jim Bosch for DM-11871 this might be unnecessary come the next weekly.

Show
Eli Rykoff added a comment - With the quick fix from Jim Bosch for DM-11871 this might be unnecessary come the next weekly.
Hide
Lauren MacArthur added a comment -

Yeah, I saw that. I've already done most of the work, and I think it's still worthwhile as I've switched to using catalog.get(filedNameKey) everywhere (which proved to be faster than catalog.get(fieldName), so may still be the best bet even with the catalog[fieldName] fix?) I'm also figuring going with keys is more foolproof against potential (however unlikely) future regressions with the fieldName getting. If I may, I'll bother you (Eli Rykoff, that is) for the review and you can let me know if I'm mistaken with any of my conjectures?!

Show
Lauren MacArthur added a comment - Yeah, I saw that. I've already done most of the work, and I think it's still worthwhile as I've switched to using catalog.get(filedNameKey) everywhere (which proved to be faster than catalog.get(fieldName) , so may still be the best bet even with the catalog[fieldName] fix?) I'm also figuring going with keys is more foolproof against potential (however unlikely) future regressions with the fieldName getting. If I may, I'll bother you ( Eli Rykoff , that is) for the review and you can let me know if I'm mistaken with any of my conjectures?!
Hide
Eli Rykoff added a comment - - edited

Happy to take a look for the review. I'm curious how much faster you find the key access to be? And if you want to be a bit more terse, then catalog[fieldNameKey] should be the same as catalog.get(fieldNameKey) (after the fix!), and looks more "pythonic" to my eye.

Show
Eli Rykoff added a comment - - edited Happy to take a look for the review. I'm curious how much faster you find the key access to be? And if you want to be a bit more terse, then catalog [fieldNameKey] should be the same as catalog.get(fieldNameKey) (after the fix!), and looks more "pythonic" to my eye.
Hide
Lauren MacArthur added a comment - - edited

A simple timing test on a real catalog does not bode well for your "pythonic" version (last one, oh, but does DM-11871 fix that too?), and the key access (3rd) is ~2x faster than get(fieldName) (2nd):

 self.col2 is the field name as a string col2Key = catalog.schema[self.col2].asKey()   catalog[self.col2]: 0.256013154984 catalog.get(self.col2): 0.000240087509155 catalog.get(col2Key): 0.000104188919067 # 2.3x faster than catalog.get(self.col2) catalog[col2Key]: 0.255188941956 

Show
Lauren MacArthur added a comment - - edited A simple timing test on a real catalog does not bode well for your "pythonic" version (last one, oh, but does DM-11871 fix that too?), and the key access (3rd) is ~2x faster than get(fieldName) (2nd): self.col2 is the field name as a string col2Key = catalog.schema[self.col2].asKey()   catalog[self.col2]: 0.256013154984 catalog.get(self.col2): 0.000240087509155 catalog.get(col2Key): 0.000104188919067 # 2.3x faster than catalog.get(self.col2) catalog[col2Key]: 0.255188941956
Hide
Lauren MacArthur added a comment -

Would you mind having a look? The before and after time python profiles for one of my command lines (on just two ccds) are as follows:

 pipe_analysis: master real 5m4.139s user 4m3.749s sys 0m5.680s 

 pipe_analysis: tickets/DM-11873 real 2m33.943s user 1m48.800s sys 0m5.631s 

Show
Lauren MacArthur added a comment - Would you mind having a look? The before and after time python profiles for one of my command lines (on just two ccds) are as follows: pipe_analysis: master real 5m4.139s user 4m3.749s sys 0m5.680s pipe_analysis: tickets/DM-11873 real 2m33.943s user 1m48.800s sys 0m5.631s
Hide
Eli Rykoff added a comment -

As for your timing tests above, that makes sense. The slow versions are only fixed on afw master as of this afternoon, will be in the next weekly. Then the .get() and [] versions will be the same, and I think that [key] will be as fast as .get(key) but I haven't tested that explicitly yet, but from the code there's no reason there should be any difference. One of the mysteries that led me to look into this was that people had said that "oh, if you look up the key first it's much faster" but I wasn't seeing that speedup. Turns out that the speedup coincidentally happens if you also change to .get().

You also show that there is still a factor of two speed-up when you pre-lookup the key. But there's a certain point where other parts of the code/io are the bottlenecks. Of course, an easy factor of two won't hurt!

Will get to the review first thing tomorrow morning.

Show
Eli Rykoff added a comment - As for your timing tests above, that makes sense. The slow versions are only fixed on afw master as of this afternoon, will be in the next weekly. Then the .get() and [] versions will be the same, and I think that [key] will be as fast as .get(key) but I haven't tested that explicitly yet, but from the code there's no reason there should be any difference. One of the mysteries that led me to look into this was that people had said that "oh, if you look up the key first it's much faster" but I wasn't seeing that speedup. Turns out that the speedup coincidentally happens if you also change to .get(). You also show that there is still a factor of two speed-up when you pre-lookup the key. But there's a certain point where other parts of the code/io are the bottlenecks. Of course, an easy factor of two won't hurt! Will get to the review first thing tomorrow morning.
Hide
Lauren MacArthur added a comment -

Indeed. Why don't I do that explicit test with master afw and, assuming you are indeed correct, I'll go with the more pythonic version before you officially review?

Show
Lauren MacArthur added a comment - Indeed. Why don't I do that explicit test with master afw and, assuming you are indeed correct, I'll go with the more pythonic version before you officially review?
Hide
Eli Rykoff added a comment -

Sounds like a good plan. I'll wait for your notice.

Show
Eli Rykoff added a comment - Sounds like a good plan. I'll wait for your notice.
Hide
Lauren MacArthur added a comment -

Alright, I've done the test using afw:master. Timings are now as follows:

 catalog[self.col2]: 0.000131845474243 catalog.get(self.col2): 5.10215759277e-05 catalog.get(col2Key): 2.38418579102e-05 catalog[col2Key]: 1.59740447998e-05 

So it seems the more pythonic key access wins (cat[key] is ~1.5x faster than cat.get(key), and it is still ~8x faster than cat[fieldName]). I have updated accordingly and this is now ready for review.

Show
Lauren MacArthur added a comment - Alright, I've done the test using afw:master . Timings are now as follows: catalog[self.col2]: 0.000131845474243 catalog.get(self.col2): 5.10215759277e-05 catalog.get(col2Key): 2.38418579102e-05 catalog[col2Key]: 1.59740447998e-05 So it seems the more pythonic key access wins ( cat[key] is ~1.5x faster than cat.get(key) , and it is still ~8x faster than cat[fieldName] ). I have updated accordingly and this is now ready for review.
Hide
Eli Rykoff added a comment -

That's interesting about the relative timing of .get() and the brackets; I'm surprised it's that different, but glad that pythonic wins (because it's more natural).

Though in the current PR I don't know if you do get the advantage of this speedup: I would guess that key = schema['field'].asKey() directly followed by value = cat[key] would be the same as value = cat['field']. The advantage of the pre-lookup of the key is if you're using it multiple times and thus only incurring that overhead once. Can you check this?

Show
Eli Rykoff added a comment - That's interesting about the relative timing of .get() and the brackets; I'm surprised it's that different, but glad that pythonic wins (because it's more natural). Though in the current PR I don't know if you do get the advantage of this speedup: I would guess that key = schema ['field'] .asKey() directly followed by value = cat [key]  would be the same as value = cat ['field'] . The advantage of the pre-lookup of the key is if you're using it multiple times and thus only incurring that overhead once. Can you check this?
Hide
Lauren MacArthur added a comment - - edited

Good point...I had looked at this, but was thinking on pre-fix timescales, so a re-look is certainly in order. It seems the key = schema[field].asKey() takes on order of 5e-05. I've added the asKey key-fetching to the timing and they now look like:

 catalog[self.col2]: 0.000203847885132 catalog.get(self.col2) 5.48362731934e-05 asKey + catalog.get(col2Key): 5.10215759277e-05 asKey + catalog[col2Key]: 4.50611114502e-05 

So it is still a factor of ~4 times faster for the key fetch + [key] access.

Show
Lauren MacArthur added a comment - - edited Good point...I had looked at this, but was thinking on pre-fix timescales, so a re-look is certainly in order. It seems the key = schema[field].asKey() takes on order of 5e-05. I've added the asKey key-fetching to the timing and they now look like: catalog[self.col2]: 0.000203847885132 catalog.get(self.col2) 5.48362731934e-05 asKey + catalog.get(col2Key): 5.10215759277e-05 asKey + catalog[col2Key]: 4.50611114502e-05 So it is still a factor of ~4 times faster for the key fetch + [key] access.
Hide
Eli Rykoff added a comment - - edited

Just as a disclaimer, the following comment is far in the weeds, but hopefully not too far. I think we're very far down in bits that probably don't matter much, and the code updates as they are plus (especially) the afw update will get you a huge performance boost and that's a big win. But I would like to understand what's going on here with these timing differences.

That said, where is the code you are using for the timing tests? With my simple tables, I can't seem to replicate them, in that it looks pretty similar no matter what I do. I wonder if there's some sort of caching going on behind the scenes?

As for profiling the different code-paths for asKey + catalog[colKey] vs catalog[field] the difference seems to be that the first uses key = schema[field].asKey() (as you say), and the second the equivalent of key = schema.find(field).key. But testing these two ways of accessing keys on their own I find the same performance (but with the ipython complaint about possible caching).

Show
Eli Rykoff added a comment - - edited Just as a disclaimer, the following comment is far in the weeds, but hopefully not too far. I think we're very far down in bits that probably don't matter much, and the code updates as they are plus (especially) the afw update will get you a huge performance boost and that's a big win. But I would like to understand what's going on here with these timing differences. That said, where is the code you are using for the timing tests? With my simple tables, I can't seem to replicate them, in that it looks pretty similar no matter what I do. I wonder if there's some sort of caching going on behind the scenes? As for profiling the different code-paths for asKey + catalog [colKey] vs catalog [field] the difference seems to be that the first uses key = schema [field] .asKey() (as you say), and the second the equivalent of key = schema.find(field).key . But testing these two ways of accessing keys on their own I find the same performance (but with the ipython complaint about possible caching).
Hide
Lauren MacArthur added a comment -

Not too far at all! I think I've just confirmed your cashing suspicion. I was running the timing – simply doing:

 from timeit import default_timer as timer   start = timer() catalog[fieldName] end = timer() time1 = end - start print 'catalog[fieldName]: ', time1 

within my analysis script and trying the different methods in sequence. I just tried doing them one at a time (editing and then doing a new run) to eliminate any possibility of cashing. Indeed, they all seem to have about the same performance speed-wise when run independently. If I run one then another in sequence, the second (and any subsequent) – regardless of order and even if the same method is simply repeated – gets the speedup of ~4x.

So, for code simplicity, I'm thinking it would be best to move to all calls being just catalog[feildName]. Let me know if you disagree.

I am also now wondering if there is any use case where looking up the key provides an advantage? Re-use doesn't seem to apply given that, after a first access call (of any key &/or fieldName fetching method), all subsequent calls (identical or varied in key &/or fieldName fetching method to the first), get the speedup.

Show
Lauren MacArthur added a comment - Not too far at all! I think I've just confirmed your cashing suspicion. I was running the timing – simply doing: from timeit import default_timer as timer   start = timer() catalog[fieldName] end = timer() time1 = end - start print 'catalog[fieldName]: ', time1 within my analysis script and trying the different methods in sequence. I just tried doing them one at a time (editing and then doing a new run) to eliminate any possibility of cashing. Indeed, they all seem to have about the same performance speed-wise when run independently. If I run one then another in sequence, the second (and any subsequent) – regardless of order and even if the same method is simply repeated – gets the speedup of ~4x. So, for code simplicity, I'm thinking it would be best to move to all calls being just catalog[feildName] . Let me know if you disagree. I am also now wondering if there is any use case where looking up the key provides an advantage? Re-use doesn't seem to apply given that, after a first access call (of any key &/or fieldName fetching method), all subsequent calls (identical or varied in key &/or fieldName fetching method to the first), get the speedup.
Hide
Eli Rykoff added a comment - - edited

Okay, that's interesting on the follow-up speed-up of whichever you do next. So the system is working appropriately, and any of these methods is equally good, and behind-the-scenes no matter what you do the caching seems to work. So that's all excellent news: there shouldn't be any significant performance difference based on your exact choice of incantation. All good!

So I 100% agree that for simplicity catalog[fieldName] is the way to go. It's easy to read, pythonic (or at least numpythonic) and performs well.

As to your question on whether pre-fetching the key helps, I came up with the following toy example:

 import lsst.afw.table as afwTable import numpy as np schema = afwTable.Schema() nCol = 10 for i in xrange(10):  schema.addField('col%05d' % (i), type=np.float32,doc='column')   testCat = afwTable.BaseCatalog(schema) nRow = 1000 testCat.table.preallocate(nRow) for i in xrange(nRow):  rec=testCat.addNew()  for j in xrange(nCol):  rec['col%05d' % (j)] = j   def test1():  for i in xrange(nRow):  for j in xrange(nCol):  test = testCat['col%05d' % (j)][i]   def test2():  keys = []  for j in xrange(nCol):  keys.append(testCat.schema['col%05d' % (j)].asKey())    for i in xrange(nRow):  for j in xrange(nCol):  test = testCat[keys[j]][i]  

The point here is that we're doing row-wise and column-wise access of a long table, so you save a bunch of time if you prefetch the keys:

 In [14]: %timeit test1() 1 loop, best of 3: 296 ms per loop   In [15]: %timeit test2() 10 loops, best of 3: 113 ms per loop 

But if you're doing column-wise access of the whole table, it definitely won't make any difference.

Show
Eli Rykoff added a comment - - edited Okay, that's interesting on the follow-up speed-up of whichever you do next. So the system is working appropriately, and any of these methods is equally good, and behind-the-scenes no matter what you do the caching seems to work. So that's all excellent news: there shouldn't be any significant performance difference based on your exact choice of incantation. All good! So I 100% agree that for simplicity catalog [fieldName] is the way to go. It's easy to read, pythonic (or at least numpythonic) and performs well. As to your question on whether pre-fetching the key helps, I came up with the following toy example: import lsst.afw.table as afwTable import numpy as np schema = afwTable.Schema() nCol = 10 for i in xrange ( 10 ): schema.addField( 'col%05d' % (i), type = np.float32,doc = 'column' )   testCat = afwTable.BaseCatalog(schema) nRow = 1000 testCat.table.preallocate(nRow) for i in xrange (nRow): rec = testCat.addNew() for j in xrange (nCol): rec[ 'col%05d' % (j)] = j   def test1(): for i in xrange (nRow): for j in xrange (nCol): test = testCat[ 'col%05d' % (j)][i]   def test2(): keys = [] for j in xrange (nCol): keys.append(testCat.schema[ 'col%05d' % (j)].asKey())   for i in xrange (nRow): for j in xrange (nCol): test = testCat[keys[j]][i] The point here is that we're doing row-wise and column-wise access of a long table, so you save a bunch of time if you prefetch the keys: In [ 14 ]: % timeit test1() 1 loop, best of 3 : 296 ms per loop   In [ 15 ]: % timeit test2() 10 loops, best of 3 : 113 ms per loop But if you're doing column-wise access of the whole table, it definitely won't make any difference.
Hide
Lauren MacArthur added a comment -

Ah, right...and I did have an instance of row access where I'm now doing key prefetching.

Ok, I think this is finally good-to-go. It's a lot less churn than previous iterations, but I did find other time-wasting issues that I've fixed along the way, so this was still worthwhile (and a learning experience!) Can you give the official review?

Show
Lauren MacArthur added a comment - Ah, right...and I did have an instance of row access where I'm now doing key prefetching. Ok, I think this is finally good-to-go. It's a lot less churn than previous iterations, but I did find other time-wasting issues that I've fixed along the way, so this was still worthwhile (and a learning experience!) Can you give the official review?
Hide
Eli Rykoff added a comment -

I have to go pick up the kids, but it's the top of my list.

Show
Eli Rykoff added a comment - I have to go pick up the kids, but it's the top of my list.
Hide
Lauren MacArthur added a comment -

Approved by Eli Rykoff on the PR

Show
Lauren MacArthur added a comment - Approved by Eli Rykoff on the PR
Hide
Lauren MacArthur added a comment -

A full visit now runs in ~20min (compared to 100min+ previously).

Show
Lauren MacArthur added a comment - A full visit now runs in ~20min (compared to 100min+ previously).

#### People

Assignee:
Lauren MacArthur
Reporter:
Lauren MacArthur
Reviewers:
Eli Rykoff
Watchers:
Eli Rykoff, John Swinbank, Lauren MacArthur