Detecting Bad Data and Anomalies with the TDDA Library (Part I)

Posted on Fri 04 May 2018 in TDDA • Tagged with tests, anomaly detection, bad data

The test-driven data analysis library, tdda, has two main kinds of functionality

  • support for testing complex analytical processes with unittest or pytest
  • support for verifying data against constraints, and optionally for discovering such constraints from example data.

Until now, however, the verification process has only reported which constraints failed to …

Continue reading

Saving Time Running Subsets of Tests with Tagging

Posted on Tue 01 May 2018 in TDDA • Tagged with tests, tagging

It is common, when working with tests for analytical processes, for test suites to take non-trivial amount of time to run. It is often helpful to have a convenient way to execute a subset of tests, or even a single test.

We have added a simple mechanism for allowing this …

Continue reading

Our Approach to Data Provenance

Posted on Tue 12 December 2017 in TDDA • Tagged with data lineage, data provenance, data governance, tdda, constraints, miro

NEW DATA GOVERNANCE RULES: — We need to track data provenance. — No problem! We do that already! — We do? — We do! — (thinks) Results2017_final_FINAL3-revised.xlsx

Our previous post introduced the idea of data provenance (a.k.a. data lineage), which has been discussed on a couple of podcasts recently. This is an issue that is close to our hearts at Stochastic Solutions. Here, we'll talk about how we handle this issue, both methodologically and in …

Continue reading

Data Provenance and Data Lineage: the View from the Podcasts

Posted on Thu 30 November 2017 in TDDA • Tagged with data lineage, data provenance, data governance, tdda, constraints

In Episode 49 of the Not So Standard Deviations podcast, the final segment (starting at 59:32) discusses data lineage, after Roger Peng listened to the September 3rd (2017) episode of another podcast, Linear Digressions, which discussed that subject.

This is a topic very close to our hearts, and I …

Continue reading

Automatic Constraint Generation and Verification White Paper

Posted on Fri 06 October 2017 in TDDA • Tagged with tdda, constraints, verification, bad data

We have a new White Paper available:

Automatic Constraint Generation and Verification

Abstract

Correctness is a key problem at every stage of data science projects: completing an entire analysis without a serious error at some stage is surprisingly hard. Even errors that reverse or completely invalidate the analysis can be …

Continue reading