Thanks for being here!
Alt Data Weekly is Powered by Vertical Knowledge
Announcement(s):
Check out VK’s updated Core Collection List. Our Core Collection List contains hundreds of actionable, targeted datasets covering a broad range of industries, sectors, and geographies.
Vertical Knowledge will have a presence at two upcoming conferences:
- Eagle Alpha’s January 18th data conference in NY
- BattleFin’s January 24-26th data conference in Miami
Let me know if you plan to be in attendance.
Theme that emerged in this week’s email is … publicly available data is valuable & everywhere. The key is to collect at scale, organize at scale, market, sell, & ultimately improve decision-making.
QUOTES
“These results highlight that the emergence of alternative data … can reduce the value-added by managers who rely on traditional methods to generate their signals. They suggest that, as in other industries, the big data revolution has the potential to displace high-skilled workers in the finance industry, who lack the skills to leverage information contained in big data.” - Maxime Bonelli & Thierry Foucalt Displaced by Big Data: Evidence from Active Fund Managers (p 34)
News Articles
Podcasts
Cool Charts
Final Thoughts (VK & EVs … IYKYK)
#1 – Jonathan Chin published The Quagmire of Backtesting Alternative Data. December 2023.
My Take: Jonathan highlights a bottleneck in the data evaluation process. From the data vendor perspective, back-testing is opaque, unique to each prospective customer, and an overall inefficient process.
I can vouch for Jonathan’s commentary that funds & prospective customers are going deeper with the questions they ask of the data. My feeling is as people get more comfortable with the data there is no shortage of interesting questions to be asked.
The good news is the tools are getting better, the data is getting better … and the potential value to be unearthed is becoming more widely understood.
#2 – Andrea Squatrito published Generic data marketplaces are broken. November 2023.
My Take: Loyal readers of ADW will know I’ve been critical of data marketplaces. Andrea does a good job identifying issues with current marketplaces. There is a lack of trust, interests are not aligned, pricing is not transparent. I continue to like the saying “data is sold, not bought” (quote attributed to Dan Entrup as far as I can tell) … we are a long way from a fully functioning broad data marketplace.
#3 – Clark Wright published in the Airbnb Tech Blog Data Quality Score: The next chapter of data quality at Airbnb. November 2023.
My Take: Data quality is an important issue. I like the idea that Airbnb wanted to take the idea of quality beyond simply “certified” vs “uncertified”. They assign a simply understood 0-100 DQ score allowing everyone to understand the relative level of quality they are dealing with when engaging with the data, but they then went a step further to define quality differently depending on the use case (reliability vs accuracy vs usability). After reading this, I have a greater appreciation for how challenging this problem can be.
BONUS: VK’s Brian O’Keefe published More Data Is Public. Why That Matters to Governments and Corporations. December 2023. “In theory, PAI (Publicly Available Information) can be found by anybody. In reality, the sheer amount of PAI means that it can only be usefully gathered and analyzed by organizations that have invested in technology that allows them to gather the data at scale; infrastructure that enables them to store and access it en masse; and data engineering that enables them to take unstructured data and make it digestible and understandable to users.”
BONUS 2: Philip N Jefferson, Vice Chair of the Board of Governors of the Federal Reserve System, November 2023 speech. “These alternative data have transformed the way economists forecast future outcomes and measure the effectiveness of monetary policy.”
What else I am reading:
Maxime Bonelli & Thierry Foucalt published Displaced by Big Data: Evidence from Active Fund Managers. November 2023.
AC Neilsen’s Greater Prosperity Through Marketing Research. June 1964.
Adam Braff’s Spoiling The Fun(d). December 2023.
Michael Lewis’ new book Going Infinite.
Source: Ternary Data (Joe Reis & Matt Housley) published Automating Analytics with Generative AI w/ Sarah Nagy. November 2023.
My Take: Sarah Nagy is co-founder of Seek.ai, a company focused on automating analytics with generative AI. Of most interest to me was her commentary around the challenge of working with structure data. Getting the correct answer is really important and really hard. 99.9% accuracy is not good enough when the decision is high leverage (ie you’ll lose your job if you are wrong). Bottom line, You’ve got to have a human-in-the-loop.
Self-serve analytics is always just around the corner, but we have never quite gotten it right. The goal is moving the role of data analyst from being the “human Siri” to more high-value, interesting work.
Lastly, for those looking to work as a data person at a hedge fund, Sarah gives a few great real world examples of what the day-to-day life is like.
A great data team will be doing work that reflects the greatest needs of the overall business. AI will eventually allow the data researcher to do more interesting, higher value work.
Highlights (60-minute run time):
Minute 01:00 – Sarah intro & quick discussion of the recent OpenAI drama
Minute 05:00 – background on Sarah & Seek.ai
Minute 09:30 – started Seek to solve a problem … automating analytics
Minute 14:20 – discussion of data products
Minute 17:10 – preparing for the role as a startup CEO
Minute 20:00 – structuring data & choosing the problem to solve
Minute 23:00 – why is working with structured data hard?
Minute 30:00 – align revenue and AI safety
Minute 33:00 – audience questions
Minute 45:00 – interaction of software & AI
Minute 48:30 – will analytics ever be fully automated?
Minute 55:00 – what lead Sarah to trying to solve this particular (hard) problem
Josh Howarth of Exploding Topics published How Hedge Funds Use Alternative Data. December 2023.
Interesting related article from Italo Mendonca, How to Use Alternative Data for Investment Decisions. November 2023.
Source: Deloitte
Source: AIMA
This is perhaps a bit random, but I think this particular VK collection is interesting (I personally enjoy following the progress being made with driverless cars & electric vehicles).
I Visited Over 120 EV Chargers: Three Reasons Why So Many Were Broken – WSJ Nov 15, 2023.
As it relates to the above WSJ article, VK collects publicly available data from various EV charging stations merchants in increments as frequently as minutes. Each station has 1 of 4 availability types:
“Active”
“Decommissioned”
“Needs Service”
“Under Construction”
Below is the % that are listed as “Needs Service”:
“data is sold, not bought”, love this quote. Thanks for mentioning the article!