Thanks for being here!
This is the Alternative Data Weekly for Friday, November 10, 2023.
Alt Data Weekly is Powered by Vertical Knowledge
Announcement(s):
Vertical Knowledge’s new data product “VK Trends” monitors web search activity at scale with 5+ years of history.
Vertical Knowledge will have a team at Neudata’s December 7th conference in NY. Let me know if you plan to be in attendance.
Theme that emerged in this week’s email is … LLMs everywhere!
QUOTES
“The reality is that most companies don't need intense deep learning operations. They need to know, is this group of customers gonna churn? That's a problem that gives me an immediate return on my investment because I can know, okay, I can stick my customer success team on this cohort, which is at risk of churning … you don't need a PhD in order to build a churn model. What we had with this challenge was we had these incredibly smart, intelligent data scientists coming in, but the problems that needed to be solved in most organizations were pretty straightforward.” – Matthew Lynley from his interview on Data Radicals Podcast (my emphasis)
News Articles
Podcasts
Cool Charts
Final Thoughts (Pickleball)
#1 – Matthew Bernath published The Pivotal Role of Alternative Data in Hedge Fund Strategies. November 2023.
My Take: Matthew is a great voice in the data industry, worth following if are not already. He runs through a couple case studies & share the challenges and considerations institutional investors need to consider when working with this type of data. Bottom line, funds that have navigated the challenges & are using alternative data have an advantage over those that do not.
#2 – Three Data Point Thursday published 3 Reasons Your Company Shouldn't Invest Into Data. November 2023.
My Take: Here are the three reasons (with my comments):
Do you really understand the cost? (not many do … and it is generally more $$ than you think)
What are your opportunity costs? (especially in tight market for talent)
Can your business pull it off? (with the right help, yes)
The big reason most are tempted to go down this path is FOMO … Fear Of Missing Out. Doing data right offers you a huge advantage. Gotta at least try!
#3 – Andre Retterath’s Data-Driven VC published 5 Perspectives on the State of Digitization in Venture Capital from Pietro Casella (EQT). July 2023.
My Take: These insights come from Pietro Casella of EQT. Of most interest to me were Pietro’s thoughts around boiling the ocean vs being focused. “My current approach is to start with less data but higher quality, focus on niche datasets targeting the thesis you are pursuing, and expand from there.”
The five headlines:
What’s the status of the VC industry in terms of data-driven initiatives and AI?
Why should VCs become more data-driven?
What’s your perspective on buy versus build? Is it “either-or”? Combination?
What are your major challenges or bottlenecks when looking at data-driven initiatives?
What would you recommend to a VC firm that just started out to become more data-driven?
What else I am reading:
Alex Izydorczyk’s A Market Research Colossus. November 2023.
Blackstone Leaders on AI. September 2023.
Data Driven VC’s How to Not Miss the AI Train: Essentials You Need to Know. October 2023.
Arcesium’s Going the Distance to Build a Robust Data and Analytics Infrastructure. Part 3. November 2023. I had highlighted Part 1 a few weeks ago.
Liberty Source’s Data, Data Everywhere And Not A Byte To Use. November 2023.
Source: Data Radicals published Everything You Wanted to Know about LLMs, but Were Too Afraid to Ask with Matthew Lynley, Founding Writer of Supervised.
My Take: Bottom line is there is a lot of innovation happening right now. Thought this was a very good overview. Stresses the importance of domain knowledge (asking the right question of the data). Also the importance of getting the low-hanging fruit. Most problems are straightforward…important to not let “expertise be overapplied” … I took this to mean that we should not overthink things and just find some easy wins with the data.
Full transcript available is you follow the above link.
Highlights (49-minute run time):
Minute 01:30 - new venture supervisor & background
Minute 06:00 - data scientists doing the work of data analysts
Minute 07:30 - 80-20 rule applies to analytics and tooling
Minute 12:30 - LLM learning curve; what are the basics for new people
Minute 14:00 - LLMs are easy to get into; tough when you dig down a level
Minute 15:30 - hyper personalized models; prompt engineering
Minute 16:45 - discussion of promoting
Minute 18:45 - importance of domain knowledge
Minute 19:10 - “everything comes back to the data”
Minute 20:00 – RAG discussion
Minute 22:30 - DataBricks vs SnowFlake (more at 42:00)
Minute 28:00 - how to build an LLM business. Race to zero?
Minute 29:00 - The whole ability to differentiate or get an edge is to have better data inputs
Minute 32:00 - open-source vs proprietary
Minute 33:45 - predictions for 1-year out (compare to iPhone)
Minute 40:00 - still some errors
Minute 42:00 - more on DataBricks vs SnowFlake
Minute 45:00 - data stack getting very expensive
BONUS: Very entertaining … Step Inside the Data Drama: Unveiling NinjaCat's 'Big Data Day' Musical. October 2023.
Source: Data Boutique’s The Impact of ChatGPT on the Web Scraping Industry. November 2023.
“Training models is a data-devouring activity, which for many models means scraping, scraping, and again, scraping.”
The Pickleball Craze
The one thing I hear more about than LLMs is pickleball.
I thought this chart was interesting from VK’s Bestsellers web collection product.
Source: Vertical Knowledge’s daily collection of the Top 100 Bestseller products/brands across 1,700+ different categories.
Regarding the pickleball chart.. .seems like seasonality. Dips happen each fall and ramp back up in the spring.
Thanks for sharing the chart, John!