However, as it stands the majority of the tools in place cost a fair bit of money to set up and run
You can get by with 4G of VRAM if all you want is to generate some pictures, or differently put every PC capable of 1080p gaming should do the trick. With good software (comfyui) you can do SDXL just fine, and almost crush SD1.
It’s fine-tuning much less training models where things get expensive but there’s other ways to get creative with those models. Training is only ever barely possible on gaming GPUs because those cap out at about 16G VRAM.
(Just for completeness’ sake, for anyone wondering “why don’t I just use my 32G worth of CPU RAM to supplement the VRAM?” – that’s already happening anyways. You need a minimum amount of VRAM or your box will be busier shuffling data from and to the GPU than it is actually doing calculations: Your GPU is going to thrash. If that happens it’s probably faster to run the AI on the CPU and, well, it’s just not build to run that kind of code).
Stock images are a whole category of awful of their own.