Create initial subset of timeseries features for DIAObject

XMLWordPrintable

Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
8
• Sprint:
AP S19-4
• Team:

Description

While we wait for a complete list of DIAObject timeseries features (https://jira.lsstcorp.org/browse/DM-11962) it will be useful to have various simple statistics to assist with exploring AP processing outputs.  The implementation is not expected to be final.

In a brown bag discussion, we suggested the following (all computed on calibrated fluxes):

• max
• min
• mean
• std
• skew
• median
• percentiles ([5,25,50,75,95])
• Stetson J
• Linear trend
• max delta flux/delta t
• average flux error

Activity

Hide
Eric Bellm added a comment -

Packages listed here https://www.kaggle.com/michaelapers/the-plasticc-astronomy-classification-demo may be useful for reference, particularly cesium, FATS or feets, tsfresh, and vartools.

Show
Eric Bellm added a comment - Packages listed here https://www.kaggle.com/michaelapers/the-plasticc-astronomy-classification-demo may be useful for reference, particularly cesium, FATS or feets, tsfresh, and vartools.
Hide
Chris Morrison added a comment -

Decided to split the initial implementation of the time-series features from making a more generalized piece of code. DM-18494 will create a more general and configurable interface for these calculations.

If you would like to look at the current outputs from these metrics they can be found in /project/morriscb/src/ap_verify/test_new_cols/association.db These data are g band only and run exclusivly on ccdnum=25. I would suggest using the the query select [columns that start with gPSFlux or gFPFlux] from DiaObject where validityEnd is NULL and flags = 0. The flags = 0 selection is optional but will give a clear look at the output statistics.

Just realized that I should check the units in the extra-columns file to make sure they line up with what is calculated/stored.

Show
Chris Morrison added a comment - Decided to split the initial implementation of the time-series features from making a more generalized piece of code. DM-18494 will create a more general and configurable interface for these calculations. If you would like to look at the current outputs from these metrics they can be found in /project/morriscb/src/ap_verify/test_new_cols/association.db These data are g band only and run exclusivly on ccdnum=25. I would suggest using the the query select [columns that start with gPSFlux or gFPFlux] from DiaObject where validityEnd is NULL and flags = 0. The flags = 0 selection is optional but will give a clear look at the output statistics.  Jenksing run: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29521/pipeline Just realized that I should check the units in the extra-columns file to make sure they line up with what is calculated/stored.
Hide
Chris Morrison added a comment -

Show
Hide
Chris Morrison added a comment -
Show
Chris Morrison added a comment - New Jenkins: https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29523/pipeline
Hide
Eric Bellm added a comment -
Show
Eric Bellm added a comment - Looks good to me.  I put a little poking into a notebook at https://github.com/lsst-dm/ap_pipe-notebooks/blob/u/ebellm/DM-18318_review/DM-18318_review.ipynb
Show
Chris Morrison added a comment - One last jenkis after the review:  https://ci.lsst.codes/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/29536/pipeline/46

People

• Assignee:
Chris Morrison
Reporter:
Eric Bellm
Reviewers:
Eric Bellm
Watchers:
Chris Morrison, Eric Bellm