Thanks for being here!
Announcement(s):
We are happy to announce that we have worked with a third party to produce a case study using our new gift card dataset. As part of the case study, we have nowcasted Roblox Bookings (worldwide) with low-lag & ahead of monthly news releases.
Contact me to learn more.
Theme that emerged in this week’s email is … there are significant challenges to data orchestration, data observability, & quality assurance ... and there are A LOT of smart, well-funded people working on solutions.
QUOTE
“What do we (as a company) prioritize that doesn't show up on the spreadsheet?”
“Data sharing is the way to optimize higher-relevant data, generating more robust data and analytics to solve business challenges and meet enterprise goals,”
News Articles
Podcasts
Cool Charts
Final Thoughts (The Struggle is Real)
Appendix
#1 – Kyle Kirwan of BigEye published Complete guide to understanding data observability. October 2022.
My Take: This is your one-stop shop for all things data observability. Almost too much good information here to summarize. Most interesting to me was the list of best practices, which aligns with the recent theme of making good decisions up-front. The article also does a good job of clearly defining & delineating data observability, data testing, data quality, and data reliability.
#2 – benn.substack published Data's invisible hand. October 2022.
My Take: benn.substack is a good, data-related SubStack for those looking for new sources of thought-provoking information. In this article the author discussed how an “analytical economy” might work. Sounds like an interesting thought exercise about ways to solve the prioritization problem (what do you work on & in what order?). A better solution is to focus on getting the data organized in such a fashion that users can query using natural language and then share those results like a social network (see ThoughtSpot or Seek.ai). The solution to prioritization for data teams would be driven by the multitude of data requests pouring in from the network of users. An example might be the fact that, ahead of earnings releases, stock analysts are all trying to answer the same questions (what is important about an earnings release changes every quarter). The data team would have a great understanding of where to focus if the users we all asking similar questions.
Think like you are the CEO.
#3 – Gediminas Rickevicius, VP of Global Partnerships at Oxylabs.io published 7 ways eCommerce businesses use alternative data in 2022. October 2022.
My Take: “While its (alternative data) use has typically been associated with hedge funds and investment firms, other businesses are now taking advantage of alternative data to enhance decision-making.” Great list of potential use cases. I still think this is tough to execute for the individual firm to make happen, but the time is coming when these types of insights will be more broadly available to all.
Gain Product and Service Insights
Obtain Demographic Information
Track Brand Mentions
Refine SEO and Content Marketing Strategy
Expand Locations and Optimize Operating Hours
Predict Consumer Demand
Monitor Trends
BONUS: Monte Carlo published The Ultimate Data Observability Platform Evaluation Guide. October 2022. Getting started with data observability? Here are the 10 things you need to succeed (see article #1 above).
BONUS 2: I wanted to include this LobbyingData.com article as yet another example of the exponential growth in the types & amount of data that are available. The amount of data being collected in all areas of life is exploding. The value over the next generation is going to flow towards those firms that can make sense of it all. This is going to take some time, but progress is being made.
BONUS 3: Thalia Barrera published Data Engineering Past, Present, and Future. October 2022.
#1 – Shane Gibson from Agile Data interviews one of my favorite data people, Ashwin Kamath of Spectre Data. Analytical team topologies. October 2022.
My Take: I’ve written about the tools needed to manage the explosion of data. Ashwin has firsthand experience in dealing with this sort of data & understands the challenges at each step of the process. Most interesting to me was the discussion around how reliant many organizations are on institutional knowledge and how to protect against the risk of losing important institutional knowledge if (when) people leave your company.
This also reinforces the recurring theme of making good decisions up-front about how you will manage data.
Highlights (61-minute run time):
Minute 01:00 – Ashwin’s background & the starting of Spectre Data
Minute 07:00 – dev ops & data ops
Minute 09:00 – the move from pods of data people to individuals; “with teams of one, chaos reigns”
Minute 11:00 – the lost skill of data modeling; ETL vs ELT (even TEL)
Minute 17:30 – discussion about feature engineering; importance of simplifying
Minute 22:00 – design of your organization; importance of prioritization
Minute 26:00 – automation can be dangerous; migrate to data mesh only when the time is right
Minute 32:00 – discussion around team design to protect lineage (institutional knowledge)
Minute 42:30 – focus on understanding how data is moving; “data recalls”
Minute 45:00 – discussion around credit scoring decisions
Minute 53:00 – data quality detection & alerting issues
Minute 58:30 – intersection of data orchestration & data quality
Source: The AI Hierarchy of Needs. June 2017.
This 2017 article is still applicable today. The trek from the bottom of the pyramid to the top is a long one. The amount of data being collected at the bottom is growing exponentially. This make it more difficult to execute as you move up the stack.
The top of the pyramid AI/Deep Learning nirvana is a worthy goal, but we still have a long way to go before this is applicable outside narrow use cases.
One more….
Source: The Struggle is Real
Life will not be smooth sailing. At least not for long.
Frankly, if you notice it is too smooth for too long, you might need to shake things up ... create some disruption, otherwise your acceleration will become deceleration.
I get frustrated when things are not going as planned. I get most frustrated when I mistakenly expect something will be easy.
When that happens, I try to step back & recognize that this struggle is part of the journey. What I am doing is hard. It should not be easy. It will not be easy.
Resistance is good. You don't get better in the gym without resistance. You don’t improve academically without resistance.
The same holds for your life. Resistance will make you stronger.