Thanks for being here!
Announcement(s):
Excited to let everyone know that I am joining the team at Module Q. See more in this week’s Final Thoughts below.
Theme that emerged in this week’s email is … data has value & data rights are coming to the forefront.
QUOTE
Data is the fuel of modern technological revolutions. Analogous to how vehicles require quality fuel to run efficiently, AI models need high-quality data to function optimally. - Nazirjon Ismoilijonov
News Articles
Podcasts
Cool Charts
Final Thoughts (ModuleQ)
#1 – Bright Data published The State of Public Web Data Report 2024. April 2024.
My Take: Wow! 90+ pages of important content. I will have new findings to highlight for weeks! This report is the results of a survey of 500 decision-makers across a broad range of industries. There is a lot to highlight, but I was initially surprised that 92% of respondents had some sort of public web data strategy. This is far more than I expected. There is of course the call for more data (see next article). Slides 30+ are industry specific.
#2 – Nazirjon Ismoiljonov published Betting on the Doppelgänger: The Role of Synthetic Data in the AI Privacy Question. April 2024.
My Take: Good overview of synthetic data, including the advantages (privacy, rights) & disadvantages (bias, not “real”). Synthetic data is one solution to the privacy and rights issue associated with data. It remains imperfect, but a needed solution to the massive amount of data required to train AI models.
#3 – Alex Izydorczyk’s AI Data Licensing Deals. April 2024.
My Take: We are so early in this evolution. Data has value, but price is widely divergent ($2.5m for access to X data seems wildly different than $60m for Reddit). Alex puts forth a good effort to create a list of current public AI data deals. I imagine within 1-2 years this list will expand dramatically. It remains to be seen how data owners maximize value, but my thought is that data processed through AI models will be a requirement. See related articles here:
OpenAI Utilizes YouTube Videos to Train GPT-4 Amidst Data Gathering Challenges.
And expect more TOS changes broadening data rights…like this highlighted by Vin Vashishta: What Is Google Thinking With Its New Terms Of Service?
What else I am reading:
Umesh Patel’s From Data to LLMs: Snowflake Made it Easy. April 2024.
The Hill’s House passes bill requiring warrant to purchase data from third parties. April 2024.
Microsoft and G42 partner to accelerate AI innovation in UAE and beyond. April 2024.
Crosshatch’s Information Games. April 2024.
Gulp Data and Datarade Partner to Empower Enterprises to Monetize Data. April 2024.
Trisha Leigh published Artificial Intelligence Has Read Everything On The Internet But Remains Hungry For More Data. April 2024.
Source: Let’s Talk Data Podcast published Is it Data Product or Data as a Product?. April 2024.
Corrie Birkeness – Host
Gebhard Roos – Product Manager for SAP Datasphere Data Marketplace
Maria Villar – Head of North America Data Strategy and Transformation
Tina Rosario – Chief Data Officer, SAP Europe
My Take: Defining the business problem is the key. Data products are great, but to get broad buy-in you need to communicate the business problem the data product will address.
Of most interest to me was the ide that “data as a product” is an operational model, while a “data product” is the result of that operating model.
Another interesting discussion was about data marketplaces & the importance of facilitating internal & external data exchange, monetization, and collaboration.
Highlights (31-minute run time):
Minute 01:00 – Maria starts us off … data product or data as a product
Minute 03:00 – Tina discusses characteristics of data products
Minute 07:30 – Gebhard affirms the importance of clear data product use cases; driven by overall culture
Minute 09:00 – Data assets vs data products; Characteristics of Data Products: Delving into the transparency, alignment with consumer needs, and governance constraints that distinguish data products.
Minute 12:00 – Discussion of data marketplaces; beginning of search
Minute 15:00 – Models for external data sharing (public domain data, internal data)
Minute 20:30 – Org and technical challenges to creating data products (scale & accountability)
Minute 26:00 – Consumer is the middle of the data product
Source: DataBoutique’s web data ecosystem map (found here).
“Our model operates under straightforward rules: To qualify for inclusion, actors must provide a verifiable service related to web data.
We categorize these services broadly, accommodating a diverse spectrum from scalable one-to-many services—like proxy network providers and scraping tools—to bespoke one-to-one solutions tailored for specific needs, such as custom data extraction or consultancy for business intelligence and analytics. Additionally, our map includes everything from basic, low-level raw data services (data feeds) to more complex, high-level offerings that provide insights (market intelligence) and everything in between.”
BONUS: Bloombury’s I scraped all of ChatGPT’s Enterprise customers – here’s what I learned. April 2024.
Source: ModuleQ
(repeat message from last week)
I am excited to join the team at ModuleQ.
ModuleQ is an Unprompted AI company focused on delivering the right data, to the right person, at the right time, directly inside their workflow.
As someone who has spent their entire career in research & data, I am all too familiar with the issue of ensuring frontline knowledge workers like PMs, analysts, bankers, & wealth managers, have access to not only relevant data, but the best data, at the right time. While many of us have worked on the data & insights problem, few of us have tackled the delivery problem.
Enter ModuleQ.
ModuleQ has solved the last mile delivery problem by connecting to all of your data & delivering it to you in a hyper-personalized experience in your collaboration tool of choice (MS Teams, etc). See video demonstration.
As I have written these ADW’s over the past few years, it has become clear how revolutionary AI tools can become. The proactive nature of #unpromptedAI simply makes you life better. You’ll be interfacing with AI without knowing you are interfacing with AI … it will just be how things work.
You’ll be more productive, have more opportunities, and be connected in a much more relevant and personalized manner.
You can expect the ADW weekly cadence to continue. Data & AI are inseparable concepts.
I look forward to sharing more about my experience at ModuleQ in coming weeks/months.
Can’t wait!
New Email: john.farrall@moduleq.com