Taming the Machine: Putting Security at the Core of Generative AI

AI advancements, particularly Large Language Models (LLMs) and other generative model types, unlock opportunities to develop applications faster through task automation and information processing. Speed to innovation is so prized that the AI-coding tools market alone is projected to grow from $4.3 billion in 2024 to $12.6 billion by 2028. Additionally, a growing percentage of companies, from 20% in 2023 to 47% today, are choosing to develop their own internal AI tools.

All software carries inherent security risks. However, AI systems — especially those built on large language models (LLMs) — introduce unique challenges that developers and users alike should understand.

In this blog, we’ll look at the core risks of LLM applications, as well as best practices and emerging cybersecurity solutions that can help mitigate them.

Core risks of LLM applications

Risks associated with LLM applications and their usage fall into three main areas based on where they fit within the LLM application architecture: risks at the core model layer, the model provider layer, and the application layer. These LLM-specific risks present unique challenges that require specialized consideration.

Core Model Layer: Hallucinations and Output Reliability

Large language models sometimes generate information that sounds confident and credible, but is actually false. This phenomenon, often called a "hallucination," can introduce real business risks if not caught. Some common examples include:

Inventing sources, references, or citations that don't exist
Providing technical details that sound correct but are actually wrong
Creating false historical events or company facts

Hallucinations (as the name implies) often appear very convincing and can be difficult to detect and correct. While a lot has been done to address them at the model layer, oftentimes hallucinations go unnoticed as they frequently require domain expertise and case-by-case analysis to spot them.

So how can we mitigate hallucinations? While there’s not a definitive way to eliminate hallucinations outright (yet), using retrieval augmentation to bring in an external data source into your prompts and enrich the actual interaction with the model can significantly reduce them by grounding prompts with truthful data. For example, when asking about a recent event, a model won’t necessarily know what happened today, but could give you information anyways. If you have an external retrieval system in place and the model can pull that information in, then you can address gaps with trusted sources.

Output reliability is another issue specifically related to generative models. The official word for this is “Nondeterminism,” which basically means that you can’t predict exactly what the model will produce. Even if you sent it the same exact prompt, it will likely produce two different outputs; they may be similar or even identical in spirit or convey the same message, but the actual text will be different.

This is one of the biggest differences between generative models and other machine learning models, some of which are deterministic (linear regression models, decision trees, etc.). The risk here is that from one “run” to the next, you could have entirely different outputs. And if you’ve built a system which makes decisions autonomously, there’s (seemingly) no rhyme or reason for one decision or another.

Prompt engineering and configuration tweaks can significantly reduce the creative freedoms that nondeterminism brings, but it won’t outright eliminate it. Again, nondeterminism is core to what makes LLMs so special, but turning that aspect down to a lower level, i.e. tuning temperature settings, or adjusting the sampling method configuration, can reduce this creativity. You can also do things (if the model allows for it) like using seed values and system fingerprints; these make the outputs reproducible and consistent, but they don’t limit that initial creativity.

Model Provider Layer: Data Privacy and Information Security

Most companies will not (and should not, honestly) build their own LLM, unless they are going to expend significant time and resources (human and financial) to not only train the model but align it–something that companies like OpenAI and Anthropic spend a lot of time doing. Instead, most will go through a model provider who charges for access to the hosted model. The model developers, like OpenAI and Anthropic, have been at the forefront of this. However, traditional cloud computing companies have begun hosting models and charging for usage, e.g., AWS hosting Meta Llama models. Newer AI-centric cloud startups, like Together AI or Nebius, are also getting in on the action.

At this model provider layer, data privacy concerns and policies come into play. Every time someone interacts with a large language model, there’s a chance that sensitive information is being shared with a third-party provider, often without realizing it. Unlike typical software tools, LLMs process and store user inputs in ways that aren’t always transparent. For example, anything you submit to the model might be:

Saved and used to improve future versions of the model
Handled in data centers located around the world
Viewed or accessed by the provider’s internal systems or staff
Repurposed in ways that aren’t clear to the user

Many model providers have policies in place stating certain restrictions around what they will or will not do with this data, but policies can change. What makes this especially tricky is that privacy risks can add up over time. Even if individual inputs seem harmless, patterns can emerge when those interactions are analyzed together, potentially revealing sensitive business insights or user behaviors.

Whether you’re working with a well-known company or a newer entrant, you need to trust the model provider handling your sensitive data. Beyond traditional data risks, every interaction with a large language model carries the potential for exposure. Third parties can analyze patterns across multiple interactions and extract sensitive insights, meaning that even seemingly harmless usage can build up to significant risk over time.

The reputation of the model provider and knowledge of their policies and processes goes a long way when addressing risk at this layer, which can be hard when there are waves of startups popping up every week. Assess these providers as you would a cloud services provider, like AWS, Azure, Google, etc. Explore and evaluate aspects like geographic hosting, LLM input/output usage and monitoring, and data isolation options.

Application Layer: Prompt Injection and Security Vulnerabilities

Modern LLM applications bring with them a new class of security risks that don’t look much like the ones we’re used to in traditional software. Because these models are trained to predict and respond based on patterns in language, they can sometimes be manipulated in surprising ways. New attacks against LLM systems include:

Extracting hidden system prompts by carefully probing the model
Using “role-breaking” inputs to bypass safety or access controls
Inferring sensitive training data through cleverly designed questions
Manipulating conversation context to undermine security safeguards

What makes these vulnerabilities tricky is that they rely on the same capabilities that make LLMs powerful in the first place — like understanding nuanced context and generating natural-sounding responses. When those features are turned against the system, they can become real liabilities and introduce new areas of risk.

Prompt injection, sensitive information disclosure, system prompt leakage, and excessive agency are a few of the core risks and vulnerabilities to keep in mind when you’re using these applications and when you’re building them so that you can mitigate them. To help mitigate these threats, the team at Meta, creators of the Llama model family, introduced the LlamaFirewall framework at their inaugural LlamaCon conference in April. This toolkit provides AI developers with practical tools to safeguard against these types of vulnerabilities.

The OWASP Top 10 List for LLMs is also a great resource, providing actionable insights into vulnerability impact and severity. It empowers developers and security teams to prioritize initiatives to mitigate risk, with periodic updates that keep the information relevant as the threat landscape evolves. Complementary to MITRE ATT&CK, ATLAS is a knowledge base of adversary tactics and techniques against AI-enabled systems and mitigations for AI security threats.

One more thing — API security has never been more important. Over the past 10–15 years, APIs have become a cornerstone of modern computing, especially in SaaS products. When it comes to LLMs, APIs are often the primary way users interact with the model, making them a critical attack surface to understand. I could easily dedicate an entire blog post to API security (and many others have), but the key takeaway is this: whether you’re using an external provider or building your own internal model, API security must be treated as a first-class concern. Protecting this interface is essential to mitigating risk.

What Tidal Cyber Can Do to Mitigate Risk

AI-specific security policies around model provider vetting, tools for LLM vulnerability management, and API security solutions are all important controls to have in place to help mitigate core risks. However, Tidal Cyber can help customers go even further by quantifying the effectiveness of their existing security stack and understanding what more they can do to mitigate risk to AI-enabled systems at the application layer through Continuous Threat Exposure Management (CTEM).

Our approach to CTEM focuses on Threat Informed Defense (TID) to support all five stages of CTEM – scoping, discovery, prioritization, validation, and mobilization.

Grounded in ATT&CK and extensible to include ATLAS, the Tidal Cyber platform:

Organizes critical threat and defensive intelligence against ATT&CK to scope threats and discover risks to assets.
Uses Threat Profiles to prioritize and map adversary tactics, techniques, and procedures ( TTPs) to your entire Defensive Stack, down to specific capabilities.
Generates a Coverage Map and provides a Confidence Score to quantify residual risk.
Recommends how to fine tune your defenses – through policies, procedures, or technologies – to better protect against attacks against AI-enabled systems.
Continuously reassesses risk based on updated threat intelligence and changes to your Defensive Stack.

Closing thoughts

Many companies using generative AI and building LLM applications aren’t cybersecurity companies. Tidal Cyber is in the relatively unique position of doing both. We carefully evaluate LLM applications and other generative model types through the dual lens of a cybersecurity company and a developer of an AI system, NARC. Our due diligence gives our customers confidence that whatever we develop will be approached with a comprehensive risk assessment, deployment of defensive measures proven to reduce risk, and customer transparency and control. Additionally, our approach to CTEM puts us at the forefront of helping organizations operationalize ATLAS to mitigate risk to their AI-enabled systems.

Taming the Machine: Putting Security at the Core of Generative AI

Core risks of LLM applications

Core Model Layer: Hallucinations and Output Reliability

Model Provider Layer: Data Privacy and Information Security

Application Layer: Prompt Injection and Security Vulnerabilities

What Tidal Cyber Can Do to Mitigate Risk

Closing thoughts

Meet Tidal Enterprise Edition

Platform

I Want To...

Services

Company

Resources

Taming the Machine: Putting Security at the Core of Generative AI

Core risks of LLM applications

Core Model Layer: Hallucinations and Output Reliability

Model Provider Layer: Data Privacy and Information Security

Application Layer: Prompt Injection and Security Vulnerabilities

What Tidal Cyber Can Do to Mitigate Risk

Closing thoughts

Meet Tidal Enterprise Edition

Subscribe to blog updates

Platform

I Want To...

Services

Company

Resources