Skip to content

The RTX 5090 Setup That Let Me Run Qwen and ComfyUI Together

Published:
3 min read

I wanted one machine to do two jobs at the same time:

The machine was an RTX 5090. The text model was qwen3.5:27b. The image side was ComfyUI running z_image_turbo_nvfp4.safetensors.

The simple question was:

Can I keep Qwen resident in VRAM and still use ComfyUI without turning the whole setup into a crash-prone science project?

After benchmarking a few combinations, the answer was yes, but only if I stopped chasing the absolute fastest result.

The setup I kept

This is the configuration I ended up keeping:

I also kept two operating rules:

That was the best balance between speed and stability.

Why the fastest setup was not the right setup

The fastest ComfyUI profile I tested finished 3 images in 9.972s.

That sounds great until you look at the memory headroom:

That is effectively no safety margin.

It might survive a clean benchmark run, but it leaves almost no room for background overhead, slightly heavier prompts, or a different workload later.

The setup I actually kept was slower:

That tradeoff was easy to accept.

I gave up a bit of raw speed, but in return I got a much larger buffer and a setup I could trust for normal use.

The small surprise on the Qwen side

I also tested different KV cache formats for Qwen at a 28k context length.

These were the results:

The useful part was this:

q8_0 only used about 448MB more VRAM than q4_0.

That was a smaller penalty than I expected, and it made q8_0 feel like the sensible default for a large context setup.

f16 used about 1.17GB more than q4_0, which felt harder to justify in a shared-GPU workflow.

My practical takeaway

If you want one RTX 5090 to handle both a large local LLM and image generation, the winning strategy is not “push the card until it almost explodes”.

The better strategy is:

For me, that meant:

That setup was not the benchmark winner on paper.

It was the setup that actually made sense to live with.

If you want the full benchmark numbers, I also published the detailed write-up: Benchmarking Qwen 27B and ComfyUI on One RTX 5090.

New posts, shipping stories, and nerdy links straight to your inbox.

2× per month, pure signal, zero fluff.