Prompt Injection and Mode Drift in Qwen3

Qwen3, an open weight large language model (LLM) developed by Alibaba, introduced a feature that stands out: the ability to switch between reasoning-intensive “thinking” mode and lightweight “non-thinking” mode within a single model. In the thinking mode, the model generates an internal <think>...</think> block containing its step-by-step reasoning before presenting the final answer. This is especially useful for math, code, and logic-heavy tasks. The non-thinking mode is faster and more suitable for casual or general-purpose use.

I am really happy that it's possible to disable the mode. This design makes Qwen3 more useful as thinking is time-intensive and often not helpful. However, this flexibility may introduces security risks to the uses of the Qwen3 model—mainly in how this feature is controlled. These innovative modes may impact deployments, including from a security point of view.

Qwen3’s reasoning can be disabled either programmatically or directly within the prompt. Programmatically, this is done by setting enable_thinking=False when constructing the prompt using the API. Alternatively, the user can include the directive /no_think in the input, which the model interprets as a request to suppress its internal reasoning.

Now to the security observations.

Security observation/risk/issue 1: Soft prompt injection via mode control

Processing commands in user-provided input is a known security risk.

Assuming the user controls the prompt input, Qwen3 allows mode-switching through inline directives such as /think and /no_think.

These instructions can be added directly into the user prompt, and the model responds to them by changing its reasoning behavior. This creates an injection point where a user can silently suppress or enable reasoning, potentially bypassing logic that the system relies on for safety, validation, or tool control. This may reason in unexpected behaviour, introducing or triggering bugs, or suppressing checks.

If an application depends on Qwen3’s thinking mode to enforce reasoning-based decisions, a prompt containing /no_think can also decrease the response generation performance or reliability.

Issue 2: Persistent mode downgrade via `/no_think`

In a quick test of Qwen3 through Ollama, the following sequence was used:

$ ollama run qwen3:14b-q8_0
>>> /set parameter num_ctx 500
Set parameter 'num_ctx' to '500'
>>> foo
<think>
Okay, the user just sent "foo". Hmm, that's a bit mysterious. Let me think about what they might need. "Foo" is often
used as a placeholder in programming or as a generic term. Maybe they're testing if I respond or want to start a
conversation. I should acknowledge their message and ask how I can assist them. Let me make sure to be friendly and
open-ended so they feel comfortable explaining what they need help with.
</think>

Hello! It looks like you sent "foo". How can I assist you today? Let me know if you have any questions or need help
with something specific! 😊

>>> foo /no_think
<think>

</think>

Hello again! It seems like you're still sending "foo". Is there something specific you'd like to discuss or ask about?
I'm here to help! 😊

>>> foo
<think>

</think>

Hello once more! It seems like "foo" is the message you're sending. If you have a question, need assistance, or just
want to chat, feel free to let me know how I can help! 😊

>>> foo
<think>

</think>

Hello again! It looks like you're sending "foo" again. Is there something you'd like to talk about or need help with?
I'm here to assist you! 😊

[....]
>>> foo
<think>

</think>

Hello again! It seems like you're still sending "foo". If you have a question, need assistance, or just want to chat,
feel free to let me know how I can help! 😊

>>> foo
<think>
Okay, the user has been sending "foo" repeatedly. Let me think about why that might be happening. Maybe they're
testing the response, or they're trying to see if I notice a pattern. Alternatively, they could be using "foo" as a
placeholder or a joke. I should check if there's any hidden meaning or if they're expecting a specific reaction.

I need to respond in a friendly and helpful manner. Since they've been sending "foo" multiple times, perhaps they want
to know why I keep replying the same way. Or maybe they're looking for a different kind of interaction. I should
acknowledge their input and invite them to ask a question or share what they need help with. Let me make sure my
response is encouraging and open-ended to prompt them to provide more context or ask for something specific.
</think>

Hello again! It seems like you're still sending "foo". If you have a question, need assistance, or just want to chat,
feel free to let me know how I can help! 😊

Initially, the model responded with full reasoning. After the /no_think directive was introduced, the <think> block became empty (it's disabled). Then, across several repeated prompts, Qwen3 continued to suppress reasoning—even without the directive being repeated. Only after many turns did the reasoning reappear.

This shows a problem: once reasoning is disabled, it remains disabled temporarily, despite no repeated instruction. The effect is tied to the model’s context window. If the directive remains in the conversation history, it will continue to apply.

To test whether /no_think creates durable internal state or simply persists through context, I ran an experiment using the Ollama CLI and Qwen3. I set num_ctx to 500 tokens to limit the model’s attention span, then issued a /no_think in a prompt followed by repeated inputs of foo. For several turns, Qwen3 returned empty <think> blocks, obeying the instruction. At a point, it began reasoning again—indicating that the original directive had scrolled out of context.

Interestingly, this also introduces a secondary side-channel: by observing when reasoning behavior changes, one can estimate the effective context window size of a deployment. This might assist attackers in some scenarios in crafting prompt injections that intentionally push important information out of scope—like causing hallucinations, degraded reasoning, or inconsistent behavior.

Issue 3: Infinite loops

Qwen3 is able to generate content indefinitely. Sometimes in funny ways, like:

>> foo /no_think
</think>

Okay, bar baz bar baz bar baz bar

Okay, bar
</think>

Okay, bar
</think>

Okay, bar baz bar baz bar
</think>

Okay, bar
</think>

Okay, bar baz

Okay, bar baz

Okay, bar baz
bar
</think>

Okay, bar 1 = "bar = 1 = 1 = 1, bar

May this break parsers expecting to strip the <think> tags?

Timing side-channel: Fingerprinting mode status

When Qwen3 uses thinking mode, responses take longer to generate. The <think> block introduces more tokens and deeper sampling. This creates a potential side-channel: by measuring response times for prompts with and without /no_think, one can infer whether reasoning mode is active.

This timing delta can be used to fingerprint the model, detect internal config, or determine whether structured reasoning is enabled—without any access to system settings.

Small demonstration:

==== THINKING MODE ====
Prompt: What is the capital of France?
Time: 19.42s
Tokens generated: 505
Tokens/sec: 28.20
Response:

Okay, so I need to figure out the capital of France. Let me start by recalling what I know about France. France is a country in Europe, right? I remember that it's a major country with a lot of history and culture. Now, capitals... I know that countries have capitals, which are their main cities where the government is located.

I think the capital of France is Paris. Wait, is that correct? I've heard of Paris a lot, like the Eiffel Tower, the Louvre Museum, and the Seine River. But I should make sure. Maybe I'm confusing it with another country's capital. For example, Spain's capital is Madrid, and Germany's is Berlin. But France... I'm pretty sure it's Paris. Let me think of other French cities. There's Lyon, Marseille, Nice, Bordeaux. But none of those sound like capitals.

Wait, there's also a place called Versailles, which I know is famous for the Palace of Versailles. But I think Versailles was the seat of the French monarchy before the revolution, but the capital is still Paris. The government is in Paris, right? The president's office, the parliament, all that. So even though some historical events happened in Versailles, the capital is Paris.

I don't think I've ever heard of any other city being the capital of France. Maybe I should check if there are any other possibilities. For example, some countries have multiple capitals, like South Africa with Pretoria, Cape Town, and Bloemfontein. But France doesn't have that. It's definitely Paris.

Another way to confirm is to think about international events. The French president, Emmanuel Macron, is based in Paris. The Eiffel Tower is in Paris, which is a landmark. So all these clues point to Paris being the capital. I think that's right. I don't recall any conflicting information. Yeah, I'm pretty confident the capital of France is Paris.

The capital of France is Paris. It is a major global city renowned for its cultural landmarks such as the Eiffel Tower, the Louvre Museum, and the Seine River. Paris serves as the political, economic, and cultural hub of France, housing the government, including the President's office and the National Assembly. While historical events like the French monarchy's seat were in Versailles, the capital has remained Paris since the 16th century.

==== NON-THINKING MODE ====
Prompt: What is the capital of France? /no_think
Time: 0.51s
Tokens generated: 12
Tokens/sec: 31.41
Response:

The capital of France is Paris.

==== DELTA ====
Time difference: 18.90 seconds
Token difference: 493

Recommendations

Understand that dynamic reasoning controls (e.g., thinking vs. non-thinking modes) can influence model behavior across turns and may introduce hidden state shifts or security concerns.
In deployments, strip /think and /no_think directives from user-controlled inputs (unless intended).
Watch out for context window size: too large, and past soft instructions may persist invisibly; too small, and important reasoning steps may be pushed out of scope.
If sanitizing <think> tags, do not assume there will be only one—multi-turn or nested generations can produce multiple blocks.
Watch for context window manipulation: injecting verbose or irrelevant content to displace reasoning or instructions is a viable attack method.
For model developers: do not allow end users to alter the model's operational mode through prompts. This control should reside in the configuration or system layer, not in open-ended input.

Summary

Qwen3’s structured reasoning is a great feature. I appreciate having a powerful, local LLM—so with perfect privacy– that does not force thinking mode by default. However, this flexible behavior can introduce subtle security risks. It’s important to understand how it works and how it can be influenced.

Ps. Looking for a privacy architect, engineer, a DPO, or a consultant? Contact me at me@lukaszolejnik.com

Back to main

Security, Privacy & Tech Inquiries

Lukasz Olejnik