Thanks for being here!
Announcements:
Check out Goodbrand’s Data Nerds Antisocial 2024. June 13, 2024. NYC. Space limited.
Theme that emerged in this week’s email is … storytelling and explain-ability are key in the business of data.
QUOTES
“Great ML is game changing, while mediocre ML is a whole other beast — and it’s downright dangerous.” Duncan Gilchrest & Jeremy Hermann: The Danger Zone in Data Science
News Articles
Podcasts
Cool Charts
Final Thoughts (“You simply have to generate impactful outcomes consistently.”)
#1 – Eagle Alpha’s Mikheil Shengelia published The CFPB’s Proposed Rule on Open Banking and its Potential Impact on Alternative Data. June 2024.
My Take: This was published after a recent Eagle Alpha webinar on the same topic. The proposed rule says “third parties authorized to access consumer data must use it only for the purposes explicitly allowed by the consumer and are prohibited from using it for targeted advertising or other commercial purposes without consent.”
I have been consistent in saying that, while people do not want to be robbed or doxxed, people don’t really care about data privacy. It is not a major concern, particularly among young people. I think there will be A LOT of deep pocket interests willing to lobby hard to keep access to this type of data … and the CFBP will be protecting the interest of people who aren’t engaged in the fight.
#2 – LSEG / Trading Tech Insights published a white paper titled Interoperability and AI in Trading Workflows. May 2024.
My Take: 25-page white paper. There is a lot here, but of most interest to me is the idea while there has been a massive investment made is systems, data, and processes … there is a lot of room to generate more value from the data, analysis and systems at a trader’s disposal. Getting the right data pushed to the right person, at the right time is a structural advantage to those doing it well.
Another items worth highlighting is the three key considerations when deploying AI:
There must be model explain-ability and control.
There needs to be trust in the data and the documentation exposed to the system.
And the firm must possess the necessary domain expertise to ensure confidence with the output of the AI system itself – it’s about asking the ‘right’ questions.
“Financial services as an industry maintains an extraordinary wealth of information that isn’t easily discoverable or used.” (page 14)
BONUS: Traders Magazine published Assessing the Present and Future of AI in Markets. May 2024. “Being able to explain both where information comes from and why it’s relevant is critical to ensuring generative AI is trustworthy and effective.”
What else I am reading:
Auren Hoffman’s One Decade After Selling LiveRamp – What did we learn?. May 2024.
Storm King Analytics published Balancing Brilliance and Practicality: The Sustainability Challenge for Analysts & Data Scientists. June 2024.
System2 published How to Win Friends and Influence Economists: Thoughts on Nowcasting with Random Forests. May 2024.
Three Data Point Thursday’s In Case You Missed It May 2024. May 2024.
Matthew Bernath published What Is a Collaborative Data Ecosystem?. May 2024.
Alex Izydorczyk published Data Dividends. May 2024.
Datos / Bob Knopp’s Google’s New AI-Focused Search Gains Massive Attention, Shows Early Promise. May 2024.
Random Walk’s Insurance premiums are going up (reprise). May 2024. (not necessarily data related, just interesting)
Source: Anastasia Diakaki of the CFA Institute published Unstructured Data and AI in Investments. May 2024.
My Take: Interesting recording that is worth a listen. I particularly value the historical context provided around the technology & frank discussion of challenges and pitfalls.
“AI techniques are a force multiplier for human analysts” - Andy Moniz
Five NLP Techniques:
Bag of words
Bayesian modeling
Semantic Web
Deep Learning
LLMs
Highlights (60-minute run time):
Minute 01:30 – Andy Moniz (Acadian); Generative NLP & ESG investing
Minute 04:30 – history of NLP; discussion of five (5) NLP techniques
Minute 21:00 – identifying greenwashing
Minute 24:00 – Brian Pisaneschi, Senior Investment Data Scientist, CFA Institute presentation begins
Minute 27:00 – good overview of LLMs & how prepared for investment use case
Minute 34:00 – why ESG is ripe for AI
Minute 43:00 – results of sample ESG test
Minute 50:00 – how job roles are changing with advent of these new technologies
Minute 55:00 – Q&A on costs, compute power, pitfalls, skills needed
Source: Duncan Gilchrest & Jeremey Hermann published The Danger Zone in Data Science. May 2024.
My Take: The authors do a great job of describing how small mistakes can lead to major output errors.
Excerpt from the article:
In our experience, success requires having crisp, affirmative (and correct) answers to each of the following:
Do you know what you are trying to solve for?
Do you have high quality data?
Have you built features that capture the key patterns in our data?
Have you chosen appropriate models – sufficiently flexible to capture the patterns, but not so flexible as to overfit?
Do you have rigorous training, testing, and validation procedures?
Do you run careful, long-term experiments to measure impacts?
Do you have production monitoring in place?
Do you have robust and scalable pipelines and infrastructure to train and serve your models?
Have you revisited all of this recently, since what worked a year ago might be broken now?
Source: Category Pirates published 3 Steps to Find (and Communicate) Your Value. April 2022.
My Take: This is a new SubStack I’ve come across. I feel like “alternative data” is largely a new(ish) category that struggles to define value and communicate this value to the world.
AI is largely a new category that is using new tech to solve existing problems. Breaking through the AI noise is nearly impossible.
Storytelling is an undervalued skill.
The Category Pirate provides a nice framework:
Only 3 outcomes matter:
Outcomes that drive revenue
Outcomes that drive category potential
Outcomes that drive market cap
How do you contribute to those outcomes?
How do I directly or indirectly support these outcomes?
Which outcome do I have the greatest agency and ability to impact?
Which outcome do I enjoy the most?
Tell your story.
“You simply have to generate impactful outcomes consistently.”