Freelance Data Scientist, PhD Mathematics, Rhodes Scholar. I help you succeed with nearly all things data and AI.
Published Feb 16, 2024
The utility of ChatGPT and its first cousins (e.g. Copilot like offerings) is relatively clear. Two examples are
Now a driving question is how else we can generate value with generative AI (GenAI)? Data engineering and especially data extraction is one area we will explore in a series of blog posts.
But first, let’s take a step back and consider what a next step for GenAI could look like. When talking about the recent excitement around GenAI and especially large language models, relating it to the common experience of Google search seems to help.
Google search’s goal is to give you a selection of useful answers per question or search task. ChatGPT’s goal is to give you a single useful answer per question or task. The next step in AI—and our focus in this blog post—is to give you a single useful action per task. (Credits to Ben Thompson of Stratechery for this basic progression.)
Getting from a ChatGPT or other-provided useful answer to a useful action involves
In this series of blog posts we’ll look at how GenAI is changing data engineering, and how data engineering needs to adapt to harness the benefits without exploding your data quality.
Let’s take the main example of how I’ve used GenAI with customers over the past half year: extracting structured data from semi- or unstructured source data, such as PDF reports or invoices. My main take-away is
GenAI-powered data extraction means higher dev velocity and higher data risk.
The (working) titles for these posts are
Underlying these recommendations is what can be called a barbell strategy. In the original context of financial investments, Nassim Taleb defines it as follows:
If you know that you are vulnerable to prediction errors, and accept that most risk measures are flawed, then your strategy is to be as hyper-conservative and hyper-aggressive as you can be, instead of being mildly aggressive or conservative.
Source: The Black Swan, Nassim Nicholas Taleb
Coming back to tech and GenAI the message is: Go wild technically with all that GenAI enables (going wild ethically is still a no-no), but clamp down the rest of your data pipelines.
Would you like to learn more? Do you have a project that needs data or AI expertise? Then message me on LinkedIn or write to me at paul.larsen.sp@gmail.com.