Skip to main content
Version: 1.0

AI Application Concepts

tip

This page explains concepts that are helpful for understanding production Artificial Intelligence (AI) applications and AI Data Platforms. It is not intended to be an introduction to machine learning or AI.

If you are already a domain expert, you can skip to Tecton Concepts.

Fundamentals​

AI applications

Production AI applications can be broadly categorized into two main types:

  1. Predictive AI Applications: These make automated decisions based on predictions from models. For example, fraud detection or customer churn prediction. This is also sometimes referred to as “Traditional ML” or “Predictive ML”.
  2. Generative AI Applications: These create new content or data based on patterns learned from training data. Examples include support chat or document Q&A (i.e. retrieval augmented generation).

Models are created from algorithms that train on historical examples of particular outcomes we are looking to predict or generate. For instance:

  • To train a predictive model that can detect fraudulent transactions, we need a dataset of examples of fraudulent and non-fraudulent transactions.
  • To train a generative model for text completion, we need a large corpus of text data.

To train models effectively, we also need to extend our training examples with context: features, embeddings, and prompts.

Features are measurable data points that a model uses to make predictions or inform generations. They are created by Data Scientists and AI Engineers, often based on their domain expertise.

For a predictive AI application like fraud detection, we may want to use features such as:

  • How does the given transaction amount compare to a user's historical average transaction amount?
  • How many transactions has the user made in the last day?
  • Where is the location of the user making this transaction?

For generative AI applications, features might include:

  • User preferences or historical interactions
  • Time of day or other relevant environmental factors

Embeddings are dense numerical representations of data that capture semantic meaning and relationships. Embeddings allow AI models to work with high-dimensional data in a more efficient and meaningful way, often improving performance and enabling more sophisticated applications.

For a predictive AI application we may want to use embeddings such as:

  • Customer embeddings: Represent a customer's behavior, preferences, and history in a compact form for personalized recommendations or churn prediction.
  • Product embeddings: Capture product attributes, descriptions, images, and relationships for similarity searches or cross-selling applications.

For a generative AI application we may want to use embeddings such as:

  • Multimodal embeddings: Represent sentences, documents, or images in a way that captures semantic relationships, enabling tasks like language translation or sentiment analysis.
  • Semantic search embeddings: Enable efficient retrieval of relevant information from large datasets, crucial for applications like chatbots or knowledge base querying.

Prompts are structured inputs used to guide models in generative AI applications. They provide information, instructions, or constraints for a model's output.

Some examples of prompts include:

  • Personalized recommendations: "Given the user's income of $75,000, current savings of $50,000, average monthly spend of $2350, and goal to buy a house in 5 years, provide a detailed savings and investment plan.”
  • Contextual chatbot for credit card customer service: "You are a customer service assistant for a major credit card company. The user has reported an unauthorized transaction. Given their account history [transaction history events provided] and our current fraud detection alerts [fraud alert information provided], guide the user through our dispute process and suggest immediate security measures”
  • Retrieval-Augmented Generation (RAG) for mortgage applications:
    • Query: "What are the eligibility criteria for our new first-time homebuyer mortgage product?"
    • Prompt: "[Relevant sections from product documentation and regulatory guidelines] Using the provided mortgage product documentation and considering current regulatory requirements, summarize the key eligibility criteria for our first-time homebuyer mortgage product. Highlight any special features or flexibility in the criteria."

Types of AI Applications​

There are different types of AI applications:

  • Analytical: When predictions are being used in non-production environments by analysts creating reports or dashboards. These predictions help drive human decision making.
  • Operational: When predictions are being used to automate real-time decisions in production software applications.

Tecton focuses on operational AI applications.

Operational AI applications have some of the strictest and most complex requirements because they affect production applications and direct users. Latency SLAs, uptime, and DevOps best practices (code reviews, CI/CD, etc.) are critical elements of these applications.

Two Environments for Operational AI Applications​

There are two environments where operational AI applications can run:

  1. Online: The online environment is where the application that end-users interact with runs. This environment provides the ability to do low-latency real-time predictions or generations at scale.
  2. Offline: The offline environment is an environment where large amounts of historical data are stored and large-scale distributed computing is run. This is where Data Scientists and Machine Learning Engineers design and test features, embeddings, and prompts as well as train models. Offline environments are also used for large-scale batch predictions or generations.

Model Training Environments​

Model training is almost always done in the offline environment, where a model has access to large historical datasets and large-scale compute.

info

There are some advanced operational AI applications that continuously train models in the online environment in real-time. This is known as "online training", "online learning", or "continual learning".

Model Inference Environments​

Offline Model Inference: Offline model inference is when model predictions are made in large batches in the offline environment. If you want to use offline predictions in an online application, predictions are then written to a database in the online environment where they can be looked up in real-time.

Online Model Inference: Online model inference is when model predictions are made in real-time in the online environment. This may happen as a user makes a transaction or searches for a product. Online model inference is very powerful because it can incorporate fresh feature and embedding data. This allows models to adapt to real-time changes such as ongoing user behavior in an application.

info

There are two main reasons that online model inference is used in an operational AI application:

  1. When it is beneficial or necessary to incorporate fresh feature or embedding data from sources such as streams, operational databases, device data, or user input.
  2. When it is inefficient to precompute all possible predictions. For example, if you have a large set of users, but only 10 percent are active users, you may not want to repeatedly compute recommendations for the full user base. Instead, you can compute recommendations in real-time when they visit the application.

Offline and Online Context Retrieval​

Offline Context Retrieval: Offline context retrieval is when features, embeddings, or prompts are fetched in batches in the offline environment for offline model training or offline model inference. Model training requires fetching historically accurate context values for a set of training events (e.g. fraudulent and non-fraudulent transactions). Offline model inference requires fetching batches of the latest context values for the set of entities you want to generate predictions for (e.g. the set of users for which to generate product recommendations).

Online Context Retrieval: Online context retrieval is when feature, embeddings, or prompts are fetched in the online environment at low-latency to run online inference.

Was this page helpful?