Thanks for being here!
Announcements:
You can now access 90 West’s consumer transaction data via our Insights Platform (powered by Exabel).
Theme that emerged in this week’s email is … data engineering is hard work, but essential work.
Quotes:
“Opinions only exist due to a lack of data” - Thomas DelVecchio
News Articles
Podcasts
Cool Charts
Final Thoughts (the original data engineer)
#1 – Seattle Data Guy published Realities of Being A Data Engineer - Migrations. September 2022.
My Take: Migrations. They will happen as technology advances. Never easy. Bottom line is you can de-risk the move when you attack it in bite-sized chunks rather than wholesale. This ties in to themes of previous weeks including the importance of documentation, making good tech-stack decisions up-front, and communicating with end users. Really good article.
#2 – Unravel published Dataops Observability White Paper. Alternate link here. September 2022.
My Take: There are a number of firms trying to solve the big problem of data observability. With the massive increases in data created, we are left with trying to make sure the data we are using to drive decisions can be trusted & accurate. What is the trusted single source of truth? Effective observability starts with extracting the right kind of details at a granular level from every system in the modern data stack (there is that terms again!). Having complete and fine-grained details is the foundation upon which all observability capabilities are built.
BONUS: Alex Izydorczyk published Some data on data companies. October 2022. This made me recall two of my favorite must-read data business articles: 1- Abraham Thomas’s Economics of Data Businesses, & 2- Auren Hoffman’s Data-As-A-Service Bible.
BONUS 2: Exabel & Eagle Alpha published The end to end challenges of evaluating alternative data. September 2022.
BONUS 3: Neudata published A beginner’s guide to alternative data. October 2022.
#1 – Mark Fleming-Williams’ Alternative Data Podcast published The Thomas DelVecchio episode October 2022.
My Take: Really enjoyed this interview. I have similar background with market & investment research but didn’t have the creativity or foresight 10+ years ago to have created what Thomas created. Really cool.
Highlights (56-minute run time):
Minute 01:00 – interview starts; Thomas’ background
Minute 07:00 – how Thomas became an entrepreneur within the data space
Minute 10:00 – had to build a research firm to show how/why data mattered; this changed
Minute 15:00 – barriers to entry created; “other side of the moat is fragility”
Minute 20:00 – angel investor background
Minute 26:00 – the move to consumption-based system for data
Minute 37:30 – long-only vs hedge funds as potential clients
Minute 41:00 – we’ve been marching toward alternative data world … this continues
Minute 44:00 – what is next that you are looking for? How to increase TAM for data sales?
Minute 48:00 – the newest venture; accelerating “Ideation to Series A”
Source: Inside Big Data published The Future of Unstructured Data Processing. July 2022.
“While a small percentage of the data produced every day is structured data – digital information that adheres to a predefined data model or schema – the vast majority (80- 90%) is what’s known as unstructured data, data that lacks metadata or any sort of identifiable structure. As such, this “dark data” is unreadable by machines.”
Structured vs Unstructured:
Why are we doing this?
Was St. Jerome the original data engineer?
Jerome of Stridon, 342-420, is recognized as the patron saint of translators, librarians, and encyclopedists. Later known as St. Jerome, he is recognized for his life’s work of translating the bible.
Before he embarked on his life's work, the books of the bible were found in a variety of formats, languages, and physical locations.
What a daunting task!
He actually moved to Jerusalem for two years to improve his Hebrew language skills to better inform his translation.
At the end of the day, Jerome spent 23 years (382-405) translating into Latin the bible he pieced together from various versions in Hebrew, Greek, Latin, Aramaic, Septuagint (Greek from Hebrew), Hellenistic, etc.
The Latin language was considered & became the single source of truth for the church as it evolved.
This reminded me of the modern data engineer who needs to make sense of data found in disparate sources that may or may not be accurate, complete, consistent.
And like the modern data engineer, Jerome of Stridon is still being critiqued about the quality of his work (1,600 years later)!
If he had just had access to SnowFlake the job would have been much easier!