Thanks for being here!
Announcements:
This is the 100th publication of the Alt Data Weekly. To celebrate, I’ve moved over to SubStack … please have patience with any near-term hiccups.
You can now access 90 West’s consumer transaction data via our Insights Platform (powered by Exabel).
Theme that emerged in this week’s email is … data engineering is an area ripe for disruption in tooling & automation.
Quote:
“Leaders need to look at data first to succeed in their digital initiatives, rather than treating them as an afterthought to help with ad hoc projects.” - Mike Rollings, Research Vice President at Gartner
News Articles
Podcasts
Cool Charts
Final Thoughts (My Switch to SubStack)
Best Regards,
John Farrall
jfarrall@90WestData.com
#1 – Riley Predum of Springboard published on KD Nuggets 5 Key Data Science Trends & Analytics Trends. August 2022.
My Take: Of the five trends listed below, the two the stand out to me are, 1-lower barrier to entry (true!) and, 2-the focus on cleaning & maintaining data. As a non-data scientist that has learned a ton of python, AWS, Snowflake, Tableau … the tools that make working with data manageable, I can attest that these tools have significantly lowered the cost & skill barriers to entry when it comes to working with data. The cleaning & maintaining of datasets has moved to the forefront as firms start to realize the power of data, but then see how messy data can be.
Fraud – Aiding and Fighting Against It
Tech Stacks are Getting More Streamlined
Lower Technological Barrier to Entry
A Growing Demand for Competent Specialists
More Attention on Cleaning Up & Maintaining Data Sets
#2 – Chad Sanderson published The Production-Grade Data Pipeline. September 2022.
My Take: First, congrats to Chad Sanderson for building a list of 4k subscribers … that is quite an accomplishment. The article highlights a consistent theme in my weekly emails which is the importance of making good decisions about the data stack up front. The article highlights the benefits of having a central authority driving thoughtful design. It is tough to future-proof everything, but if your organization has the benefit of starting from scratch, making thoughtful decisions up front is beneficial as your organization scales. Of most interest to me was the prototype pipeline used for testing, speed, and flexibility … figure out what works and the most common use cases are…then move to the more mature production-grade pipelines. The switch from prototype to production should be made when (and only when) data quality is there.
#3 - Jakob Kristensen published Did Snowflake just hint at warehouse-native SaaS apps?. September 2022.
My Take: Can SnowFlake disrupt app development by introducing a hub-and-spoke architecture? There are quite a few benefits to be had moving from point-to-point model to a hub-and-spoke. More control, easier compliance, lower cost … just to name a few. As data control & data ownership becomes a bigger deal, it will be interesting to see how the powerful players position themselves to control the most valuable pieces of the data chain.
The idea of control is of most interest to me as many data-centric companies have data in various silos with various owners. For many reasons, it makes sense to have a single data repository from which access can be permissioned and monitored. Particularly when the data is sensitive.
BONUS: Alyssa Schroer of Built-in authored Big Data. What Is Big Data? How Does Big Data Work?. September 2022. “The three V’s”, Volume, Velocity, Variety. I thought the History of Big Data going back to 1881 was interesting!
BONUS 2: Center for Data Innovation’s Gillian Diebold produced Introducing the Data Divide. September 2022. Interesting to me how there is a push in both directions … per the video, people benefit from sharing more data about themselves and underrepresented groups often do not see the benefit of having their data in the ecosystem (healthcare for example). The other end is pushing to have less of our data shared due to privacy concerns. There is a tricky balance in the middle somewhere. My take is people really don’t care about privacy (despite the “creep factor”). The benefit of having your data in the ecosystem outweighs the risk of having your data “out there” in the ecosystem.
BONUS 3: Tesla: automaker or data company? Due diligence of data assets should be a standard practice in all new investments.
#1 – Arpit Choudhury from Databeats interviewed David Jayatillake of Metaplane about Building and Using Data Infrastructure. July 2022.
My Take: Important decisions made up-front can make a big difference down the road. This was an interesting conversation that compared/contrasted various tools in the modern data stack. My takeaway would be there is likely some consolidation coming among the players in the data stack. The data engineering space is ripe for good tooling and automation. Big companies want “one throat to choke” when dealing with vendors. The core tools ELT, Warehouse, BI tools…supplemented with observability tools and CDPs (customer data platforms), Basically tools that will ensure quality & reliability.
Highlights (14-minute run time):
Minute 00:20 – interview starts; what is means to build infrastructure
Minute 01:00 – What are the core components of a well-executed data infrastructure?
Minute 03:00 – why the explosion in data infrastructure tolling in the past couple of years?
Minute 05:00 – communication and broadening access
Minute 08:30 – modern data stack vs various tools stitched together (consolidation?)
Minute 10:00 – Pros & Cons … building set a best-of-breed tools vs all-in-one solution
Minute 12:00 – advice for those just getting started
Source: Gartner’s Hype Cycle for Data Management 2022.
123-page document. More than a couple interesting charts & tables.
Hype Cycle 2022:
Compared to 2021’s version:
I’ve switched from MailChimp to SubStack with this week’s version of The Alt Data Weekly.
Please let me know if you have any feedback.
I write a personal Substack and I found the writing tool much easier to use.
My sense is many MailChimp emails were going straight into spam … I am hoping this changes with SubStack.
SubStack is geared towards blogs/email newsletter while MailChimp is a marketing engine. Frankly, I see this weekly email as a blog rather than a marketing tool. Perhaps it is both.
SubStack is free and I was reaching a level where I was needing to pay to MailChimp what was becoming a more significant amount (good problems!).