ftisiot ideas about food and life
who talks suggestions soft json-pg

Fix Your Strings!

Presented March 7, 2023 at Pg Day Paris 2023

Slides Website Venue

Strings are one of the most used types in databases; they can store pretty much any data and don’t enforce any rules on the inserted input. This talk showcases the risk of using strings and how to mitigate them to ensure data quality.

Intro image

Abstract

Strings are one of the most used types in databases; they can store pretty much any data and don’t enforce any rules on the inserted input. Yet too much freedom sometimes leads to inconsistencies: is it Aivan or Aiven? Øyvind or Oyvind? Wine or Whine? These seemingly small differences can have bad side-effects, causing lookups to fail and incorrect aggregation results to be returned. Luckily all is not lost: PostgreSQL has some features that can help us make sense of the chaos.

In this talk you will learn what PostgreSQL has to offer: starting with pattern matching, passing by regular expressions, and ending with more advanced functionality exposed by the fuzzystrmatch and unaccent extensions. I’ll demonstrate what tools can help you fixing string inconsistencies and how to avoid making the same mistakes again in the future. This session is recommended for anyone who deeply cares about their (string) data quality.

Useful Links

  • Array Functions
  • Date Functions
  • Pattern Matching
  • Domain
  • Email Domain
  • Unaccent
  • FuzzyStrMatch
  • Aiven
  • GitHub repository with the notebook

Francesco Tisiot

Mastodon Francesco comes from Verona, Italy and works as a Senior Developer Advocate at Aiven. With his many years of experience as a data analyst, he has stories to tell and advice for data-wranglers everywhere. Francesco loves sharing knowledge with others as a speaker and writer, and is on a mission to defend the world from bad Italian food!

Story logo

© 2023