Thanks for being here!
Alt Data Weekly is Powered by Vertical Knowledge
Announcement(s):
Vertical Knowledge’s new data product “VK Trends” monitors daily web search activity at scale (ie tracking as many terms as you want) with 5+ years of history.
Vertical Knowledge will have a team at Neudata’s December 7th conference in NY. Let me know if you plan to be in attendance.
Theme that emerged in this week’s email is … better tools & processes are broadening access to data.
QUOTES
“The more quickly stakeholders have a tangible item to react to, the more quickly you’ll get valuable feedback.” - Cailin Moorman
News Articles
Podcasts
Cool Charts
Final Thoughts (Data QA)
#1 – Rich Brown published Measure Once, Cut Twice! Part 2 Of How To Turn 6 Days Into $20M Profit. November 2023.
My Take: Second of a multi-part series from Rich Brown discussing how to better manage data sourcing, procurement, and vendor management. Bottom line is you need to set up baseline metrics by which you can measure success, understanding there will be trade-offs to be made. Once established, Rich offer some suggestions for implementation.
#2 – Three Data Point Thursday published Data Product Teams As An Organizational Change Summary. November 2023.
My Take: “Are we really creating value?”. Domain expertise back in the limelight. Data is great, but data owners need to answer the “are you creating value?” question. Need to balance effectiveness & efficiency … but above all communicate effectively with the end customer (both internal customer & external customer).
Related Locally Optimistic article suggestion.
#3 – Timothy B Lee published OpenAI's vision: a chatbot in every app. November 2023.
My Take: Trying to read as much as I can on the development of AI and ChatGPT. I find ChatGPT very useful in my day-to-day. I am starting to see how it can be an interface on top of data products (think: any app you use). This can really broaden the market by making things easier to use, or I should say even easier than they already are. Still a long way to go, but interesting to watch the progress. “A lot of people thought they wanted their apps to be inside ChatGPT but what they really wanted was ChatGPT in their apps”.
BONUS 1: Franklin Templeton’s On the verge of transformation: The state of investment & wealth management in 2023-24. November 2023. “The number of activities that don’t leave a digital footprint in some way is becoming vanishingly small, and with the advent of devices and wearables, social media and streaming platforms, an individual user’s likes and even moods are becoming increasingly transparent. Some of this data is already being incorporated into investment models through alternative data providers, but it has not yet been applied to an individual’s investment portfolio in any meaningful way.”
BONUS 2: IMF’s The Third Phase of the G20 Data Gaps Initiative (DGI-3) Starts to Deliver Insights for Action. November 2023. “The main objective of the DGI-3 is to address the critical data gaps that exists in the face of the climate crisis, increasing economic polarization, and large-scale digital transformation. Its 14 recommendations are clustered around four statistical areas (i) climate change; (ii) household distributional information; (iii) Fintech and financial inclusion; and (iv) access to private sources of data and administrative data, and data sharing.”
What else I am reading:
Exabel’s How Discretionary Managers Combine Alternative Datasets to Discover Deeper Insights. November 2023.
CastorDoc’s Data Catalog Tools Benchmark.
Caitlin Moorman published Project Execution Skills Build Your Reputation for Delivering Value. July 2023.
Serge Gershkovish’s Secret Snowflake data modeling features you need to know about. November 2023.
Source: Steve Hamm of The Data Cloud Podcast interviews Torsten Grabs. May 2023.
My Take: Torsten Grabs is the Senior Director of Product Management at SnowFlake. The conversation starts with the acknowledgement that is is a disruptive moment for tech industry due to:
The way people interact with computers is changing
Tech is much more approachable for more people
Conversational nature is more appropriate for more users
More people can do more meaningful work
I’ve spent a lot of time thinking and saying that, once everyone has access to all the information available, those who ask the best questions of the data will generate the most value (alpha).
This will start with specific use cases vs general use cases. “Domain specific use cases” are easier to execute with more specific data inputs & more focused outputs.
Rather than create your own models, Torsten sees users consuming data through existing foundational models. Snowflake can be THE place where your proprietary data can interface with public data for LLMs. As a biased SnowFlake fan, I think they are well-positioned to deliver.
One big challenge is that there are still A LOT of questions about governance and knowing where all of your data is going, but we are at the front end of big changes.
Highlights (31-minute run time):
Minute 01:25 interview starts
Minute 02:00 what is going on with LLMs and AI?
Minute 03:40 how is this different than Ask Siri?
Minute 05:00 what is difference between ML modeling and LLMs?
Minute 09:30 AI makes technical people much better / more efficient
Minute 11:00 proprietary data vs public data
Minute 14:30 snowflake’s Snowpark (developer framework) & Streamlit (data applications) … is there a role for LLMs in these tools?
Minute 18:30 discussion of further applications for LLMs and SnowFlake
Minute 20:00 criteria when choosing vendors
Minute 22:30 will the bots take over?
Minute 26:00 Sci-fi reading and early ideas from those authors; social contract between humans and machines
Source: Franklin Templeton’s On the verge of transformation: The state of investment & wealth management in 2023-24
In Red I’ve highlighted areas where new data & data services can be incorporated.
BONUS: AC Nielsen’s Great Prosperity Through Marketing Research. 1964 speech from AC Nielsen.
Must read for anyone interested in this company or the field of market research & data. H/T Alex Izydorczyk.
Old-timey data products:
Data Quality
In the evolving landscape of data-driven decision-making, the spotlight often falls on cutting-edge analytics, ML algorithms, & the sheer volume of data amassed. However, there's an unsung hero in this narrative, quietly ensuring the reliability and integrity of the data that fuels everything: Quality Assurance (QA).
As organizations increasingly recognize the importance of high-quality data, the role QA plays in the data production pipeline has become more crucial than ever.
This is something about which we are vigilant at Vertical Knowledge … as trust, once lost is hard to re-gain.
The Foundations of Reliable Insights
QA serves as the gatekeeper, validating the accuracy, completeness, and consistency of data before it enters the production environment. A flaw in the data can propagate through every layer of analysis, leading to misguided decisions and, ultimately, repercussions for your business.
Mitigating the Risks of "Garbage In, Garbage Out"
The adage, "garbage in, garbage out" holds true. Flawed data, if not identified and rectified at the outset, can amplify errors and compromise the validity of analytics. QA acts as the safeguard against this, preventing bad data from impacting decision-critical processes.
The Path Forward
Robust QA processes at the data vendor level not only safeguards against inaccurate data-driven insights but also fortifies an organization's ability to innovate with confidence.
Let’s go.