Rexpy for Generating Regular Expressions: Postcodes

Posted on Wed 20 February 2019 in TDDA • Tagged with regular expressions, rexpy, tdda

Rexpy is a powerful tool we created that generates regular expressions from examples. It's available online at https://rexpy.herokuapp.com and forms part of our open-source TDDA library.

Miró users can use the built-in rex command.

This post illustrates using Rexpy to find regular expressions for UK postcodes.

A …

Continue reading

Our Approach to Data Provenance

Posted on Tue 12 December 2017 in TDDA • Tagged with data lineage, data provenance, data governance, tdda, constraints, miro

NEW DATA GOVERNANCE RULES: — We need to track data provenance. — No problem! We do that already! — We do? — We do! — (thinks) Results2017_final_FINAL3-revised.xlsx

Our previous post introduced the idea of data provenance (a.k.a. data lineage), which has been discussed on a couple of podcasts recently. This is an issue that is close to our hearts at Stochastic Solutions. Here, we'll talk about how we handle this issue, both methodologically and in …

Continue reading

Data Provenance and Data Lineage: the View from the Podcasts

Posted on Thu 30 November 2017 in TDDA • Tagged with data lineage, data provenance, data governance, tdda, constraints

In Episode 49 of the Not So Standard Deviations podcast, the final segment (starting at 59:32) discusses data lineage, after Roger Peng listened to the September 3rd (2017) episode of another podcast, Linear Digressions, which discussed that subject.

This is a topic very close to our hearts, and I …

Continue reading

Automatic Constraint Generation and Verification White Paper

Posted on Fri 06 October 2017 in TDDA • Tagged with tdda, constraints, verification, bad data

We have a new White Paper available:

Automatic Constraint Generation and Verification

Abstract

Correctness is a key problem at every stage of data science projects: completing an entire analysis without a serious error at some stage is surprisingly hard. Even errors that reverse or completely invalidate the analysis can be …

Continue reading

Constraint Generation in the Presence of Bad Data

Posted on Thu 21 September 2017 in TDDA • Tagged with tdda, constraints, discovery, verification, suggestion, cartoon, bad data

Bad data is widespread and pervasive.1

Only datasets and analytical processes that have been subject to rigorous and sustained quality assurance processes are typically capable of achieving low or zero error rates. "Badness" can take many forms and have various aspects, including incorrect values, missing values, duplicated entries, misencoded …

Continue reading