Alternative Data Weekly #154
October 6, 2023
Thanks for being here!
Alt Data Weekly is Powered by Vertical Knowledge
VK has a team of people in London for the October 12th Eagle Alpha conference. Let me know if you would like to connect.
I have been engaging with people exploring the use of VK data as training data for LLMs. Seeing some cool real-world use cases emerge. Happy to discuss if this is something your organization is exploring.
“I’ve never met a data professional who felt like they didn’t have enough to do.” – Katie Bauer, Wrong But Useful
“How data is accessed, shared and communicated are now in the process of radical change, causing significant upheaval to how we work.” – Bill Hammond
Final Thoughts (Prompt Engineering)
#1 – Jonas Laeben published Unlocking the Future of Business Decisions with Decision Intelligence. September 2023.
My Take: This article discusses Decision Intelligence (VK calls it Decision Advantage). To give business users the intelligence they need, we need to close the gap between “just data” and actionable insights. This will be an iterative process where organizations will be aided by the abundance of new data companies & tools being created … but in the end, high quality data is needed as the base input to compete in this highly competitive Decision Intelligence world.
#2 – Bill Hammond published How Generative AI is disrupting data practices. September 2023.
My Take: I continue to believe value will flow towards the owners of good data. Generative AI is starting to be applied to real-world use cases. The models built will be trained using data & we are just scratching the surface of what this means.
#3 – Seattle Data Guy published The Challenges You Will Face When Data Modeling. September 2023.
My Take: This stuff is hard. Another article highlighting the importance of domain knowledge. You have to know the right questions to ask. There is now so much data that it is overwhelming to even understand what you’ve collected & how to best put that information to use. See Final Thoughts below about prompt engineering, which is a fancy way of saying that you need to ask good questions to get valuable answers.
As defined by Joe Reis, data modeling is “Organizing and standardizing data to facilitate believable and useful information and knowledge for humans and machines.
BONUS: McKenzie Funk of the NYT published The Man Who Trapped Us in Databases. September 2023. “Remembered by industry insiders as the “father of data fusion,” Asher reigned over a vast shift in privacy norms. He shifted them himself, scooping up data sets no one else had wanted, monetizing information no one had ever thought valuable, collecting details others had thought too intimate, testing boundaries that more established companies — with their brand names and boards and reputational risks and publicly traded stocks — had yet to ever dare test.”
What else I am reading:
Jason Derise published The Jargonator T-800 Newsletter Entry. October 2023.
The Random Walk’s AI makes more leaps, but doesn’t delight for long (yet). September 2023.
One Useful Thing’s The shape of the shadow of The Thing. October 2023.
Alex Izydorczyk’s Free as in freedom, not as in beer, Pt. 2. October 2023.
Andre Retterath’s Leveraging Data Moves from Nice-to-Have to a Must, with Francesco Corea (Greycroft). September 2023.
Source: The Data Exchange Podcast published Trends in Data Management: From Source to BI and Generative AI, an interview with Sudhir Hasbe, Chief Product Officer Neo4j, a longtime technical and product leader in the data management space.
My Take: This is a pretty technical conversation, but was of interest to me as they discussed the building of data products or data apps that work in the real-world. I think the next big thing is going to be the building of data products or data apps on top of structured data. The two discuss building those basic MVP-type products. We are getting there!
Some terms they use a lot:
Vector Search - Vector search, also known as similarity search or nearest neighbor search, is a technique used to find data points that are most similar or "closest" to a given query vector in a high-dimensional space. (Source: ChatGPT)
Vector Database - A vector database is a type of database designed specifically for the storage, retrieval, and manipulation of vector data. In the context of databases, "vector data" refers to data that represents geometric objects such as points, lines, and polygons, often used to represent spatial or geographic information. (Source: ChatGPT)
Highlights (48-minute run time):
Minute 02:00 – background on initial AI data products
Minute 07:00 – do we need signal use case or do you need many different databases along with vector searches
Minute 09:00 – vector searches become part of requirements … when we get into real production
Minute 14:30 – what is most cost effective and making good decisions up front about future real world GenAI products
Minute 20:00 – topic #2 … analytics databases
Minute 24:00 – most real world use cases do not need all the data … just need subset of all the data; how should that be powered? DuckDB discussion
Minute 26:30 – real time use cases (Druid)
Minute 28:30 – Sudhir’s take on data lakes
Minute 32:00 – not just cataloging but also lineage
Minute 34:30 – discussion of knowledge graphs
Minute 40:30 – discussion of “multimodal”
Minute 43:20 – make the use case work for the customer
Minute 57:00 - “Looking for breakthrough in language to describe it to people, because I think when most people realize the difference between data/content and training data ,there is a massive breakthrough that occur there. The language-ing is still not sufficient but we are getting there.“
Source: Louise de Leyritz published Building a Data Stack Aligned With Your Business Needs. September 2023. Originally published on CastorDoc.
“Overwhelmed by tool choices for your data platform? Cut the confusion — start with what your business actually needs. Value comes in three flavors: analytics, automation, and data products. Let your use case lead the way in picking the right infrastructure.”
Abraham Thomas’ Strong Opinions, Weakly Held. October 2023.
Source: My brain.
One theme of the data-related articles I read is the importance of domain knowledge. The best source of data will do you no good if you do not have any idea what you are tying to accomplish.
I am reminded of the first time I sat in front of a Bloomberg terminal. The world’s financial information was at my fingertips. I didn’t have a clue where to begin. What question do I ask if I can ask any question?
The Bloomberg terminal became a much more useful tool when I had a specific project on which I was working & had some domain experience so I had some idea of the relevant questions to ask. Unbeknownst to me, I was becoming a better prompt engineer (the ask-er of better questions).
The same can be said of alternative data. What question do I ask when I have all the web’s publicly available data organized for me?
Eventually, alternative data like that offered by Vertical Knowledge (and many others) will become “table stakes” for decision-makers … the winners will be those who ask the best questions…and perhaps the data will no longer be considered “alternative”.
Farrall’s Four C’s of Good Data
What is a prompt engineer?
From Chat GPT:
A "prompt engineer" typically refers to a person or role responsible for designing and crafting prompts or instructions for use with natural language processing (NLP) models, chatbots, virtual assistants, and other AI-driven systems. The primary goal of a prompt engineer is to create effective and meaningful prompts that elicit desired responses or actions from these AI systems. Here are some key responsibilities and aspects of the role of a prompt engineer:
Prompt Design: Prompt engineers design prompts or input queries that users will interact with when using NLP models or AI systems. These prompts should be clear, concise, and structured in a way that guides users in providing relevant information or commands.
Intent Definition: They define the user's intent or the desired outcome of a prompt. Understanding user intent is crucial for formulating prompts that lead to accurate and relevant responses from the AI system.
User Experience: Creating prompts that provide a smooth and user-friendly experience is essential. This includes considering the language, tone, and structure of prompts to ensure they are user-centric.
Testing and Optimization: Prompt engineers often conduct testing and optimization to refine prompts based on user feedback and performance metrics. They may iterate on prompts to improve the effectiveness of interactions.
Data Collection: In some cases, prompt engineers may be involved in data collection efforts to gather training data for machine learning models. This may include collecting user-generated prompts and responses for model training.
Bias and Fairness: Prompt engineers need to be aware of bias and fairness concerns in prompt design. They should strive to create prompts that do not inadvertently favor certain groups or produce biased results.
Customization: Depending on the application, prompt engineers may customize prompts for specific industries, domains, or use cases to ensure relevance and accuracy.
Integration: They work closely with developers and AI engineers to integrate prompts effectively into AI systems and ensure seamless user interactions.
Monitoring and Maintenance: After deployment, prompt engineers may be responsible for monitoring prompt performance and making adjustments as needed to maintain optimal results.
Documentation: They often document prompt guidelines and best practices for use within an organization or project team.
In essence, a prompt engineer plays a critical role in shaping user interactions with AI systems. Effective prompt design is essential for ensuring that these systems understand and respond to user input accurately and provide a positive user experience.