You are like 2 months out of date. Stable diffusion now has a massive ecosystem around it (civitai/automatic1111), that when used well, completely crushes any competitors in terms of the images it produces.
Midjourney is still competitive, but mostly because its easier to use.
Dalle2 will get you laughed out of the room in any ai art discussion.
For real! This stuff is moving fast. It feels like just last week I was posting about how it's going to change...art. And now there are hilarious deepfake memes of past and current presidents shit talking about video games.
There are a handful of ML art subs that have pretty amazing stuff daily. Especially the NSFW ones, which if you've studied any history of media VHS/DVD/Blu-ray/the internet, porn is a major innovation driver because humans are thirsty creatures.
Yeah that's definitely one thing it'll be great at, fantasy themed porn. For me furry stuff, but yeah for others whatever their tastes are.
Atm someone has to model, rig, texture, animate etc. Hopefully shortly we can just connect a bunch of systems together to generate video right from a prompt.
Useful for non-porn stuff as well, but the OP is right; lots of innovation occurs when humans are horny (porn) or angry (war).
I scan the SD subreddit and am subscribed to 3 big ai art youtubes just to stay up to date. With things moving this fast, alot of info is out of date and can be very burdensome to comb through the good stuff later. I try and set aside 30mins twice a week to apply the new techniques to help cement them in my mind and see their strengths and weaknesses. ControlNET really changed the game and now OffsetNoise (check out the IlluminatiDiffusion model) is now really pushing SD passed midjourney for real artistic control of your output.
ControlNet became popular with in the last couple of weeks and LoRA fine-tuning slightly before that and both things have completely changed the landscape too. Even a month out of date and you are a dinosaur at the moment.
These things are advancing way faster than they're being taken advantage of fully. Even SD 1.4 with months-old technology can produce far higher quality images than most of what's seen from midjourney or the latest tools. Things like ControlNet are amazing, to be sure, but there's nothing "dinosauric" about the technology without it. We haven't begun to see the limits of what's possible yet with existing tools, though you're right about the rapid pace of innovation.
Make it two weeks. I haven't paid attention for a second and stuff like Controlnet pops up and evolves into Multi-Controlnet and then into MultiDiffusion.
That's what the singularity is all about, a moment in time when 2 seconds late turns you into a dinosaur, be greatful it's 2 months, not 2 weeks, 2 days, or 2 minutes.
I started /r/aigamedev as a subreddit to keep up to date on generative AI technologies, with a focus on the tech and workflows for gamedev. Its largely my own interest links as I research for work and personal, but its growing, and fluff free (so far).
Twitter. Folllow your top 10 or so ML/AI news summarizers. There is enough new information every day to keep you busy reading new papers, APIs, technologies.
Honestly the "This happened in the last week" is more information than anybody can fully wrap their heads around, so you just have to surf the headlines and dig into the few things that interest you.
The great thing about the AI world - is everything diffuses out quickly on the "For You" timeline - and then you can add people that you are interested in (which reinforces your interest in AI).
Some bootstrapping accounts might be @rosstaylor90, @rasbt, @karpathy, @ID_AA_Carmack, @DrJimFan, @YiTayML, @JeffDean, @dustinvtran, @tunguz, @fchollet, @ylecun, @miramurati, @nonmayorpete, @pmarca, @sama.
These are definitely not an authoritative list - just some of the AI names I follow - but, honestly - if any relevant news breaks - your timeline picks it up within minutes - so you just need a good random sample. Your interests will diverge and you'll pick up your own follows pretty quickly.
Agriculture reduced the global human economy/resource production doubling time from 100,000s of years to 1000s of years. Industrial revolution dropped it from 1000s to 10s or even 1s. If AI follows the same path it becomes 0.1 - 0.01 years.
Your 401k wouldn't need 40 years to build a comfortable retirement, only 4 weeks.
I just watched a video that convincingly showed that it is energy and energy alone that determines the production growth of humanity. Until the day AI can "generate" stuff (you know, something out of nothing) it can only at best streamline existing production, which is entirely capped by energy limits.
We may drown in oceans of audio, video, novels, poems, films, porn, blue prints, chemical formulas, etc. dreamed up by AI, but to realize these designs, blueprints, formulas, drugs, etc. ("production") we need to actually resource the materials, and have the necessary energy to make it happen.
It will not be AI that catapults humanity. It can definitely mutate human society (for +/-) but it will not (and can not) result in any utopian outcomes, alone. But something like cold fusion, if it actually becomes a practical matter, would result in productivity that would dwarf anything that came before (modulo material resource requirements).
Stable diffusion might have a reasonable eco system around it, but automatic1111 was always around and 'completely crushes any competitors' is rather rich, Midjourney is still considered the standard as far as I was aware.
I used both again recently and the difference was very clear, midjourney is leaps and bounds above anything else.
Sure, stable diffusion has more control over the output, but the images are usually average at best, were as Midjourney is pretty stunning almost always.
I thought Midjourney was better as well, until I saw some recent videos from Corridor Crew on Youtube. For those who don't know, this is a VFX studio in LA that tries to keep at the cutting-edge of video production techniques and posts content to their Youtube channel, and they have a massive number of followers and several viral videos.
They recently created a full 7-minute anime using Stable Diffusion with their own models and their existing video production gear, I'll post the links and let the results speak for themselves
the benefits of such fine grained control aren't a trick. it's why they were able to scrap together frames that don't jump all over the place (mostly).
the other benefit of such a broadly hacked upon model is that it grows in leaps and bounds.
All due respect to mid journey, but the stable diffusion hype is not just hype.
I agree, don't believe it's just hype, that level of control is useful, but for outright image quality and for most use cases, midjourney is better.
I still don't like the look of most of the Stable diffusion images, they just look slightly off/amateurish to me, where as midjourney produces images that make you go 'wow'
If you wanted to use these tools, midjourney would be my go too, with stable diffusion a backup for when some of the additional features were needed, perhaps inpanting on a midjourney image and using controlnet if needed but if you just want a pure image, midjourney is what you want.
It doesn’t really matter. He’s right - Midjourney is leagues ahead as far as actually following your prompt and having it be aesthetically pleasing. I say this as someone who has made several Dreambooth and fine tuned models and has started to use Stable Diffusion in my work.
Now, if you happen to find or make a SD model that’s exactly what you’re looking for you’re in luck. I have no interest in it but it seems like all of the anime models work pretty well.
You obviously have a ton more control in SD, especially now with ControlNet. But if you want to see the Ninja Turtles surfing on Titan in the style of Rembrandt or something Midjourney will probably kick out something pretty good. Stable Diffusion won’t.
In Midjourney you get fantastic results just by using their discord and a text prompt.
To get some similar results in Stable Diffusion you need to set it up, download the models, understand how the various moving parts work together, fiddle with the parameters, donwload specific models out of the hundreds (thousands?) available, iterate, iterate, iterate...
Setting up the environment and tooling around in the code is not a burden, it's a nice change of pace from the boring code I have to deal with normally. Likewise, playing around to build intuition about how prompts and parameters correspond to neighborhoods in latent space is quite fun.
Beyond that, being able to go to sleep with my computer doing a massive batch job state space exploration and wake up with a bunch of cool stuff to look at gives me Christmas vibes daily.
Sure, but if Midjourney outputs a low quality results for your prompt, they are going to be much more difficult to improve. It's a black box at this point.
While with SD there can be multiple solutions for a single problem, but yeah, you have to develop your own workflow (which will inevitably break with new updates)
Ridiculous. Stable diffusion might have a massive ecosystem around it but mid journey is making money hand over fist. Most people don't even necessarily have a discreet GPU necessary to be able to run SD, and the vast majority of artists that I know are using midjourney and then doing touchups afterwards.
Even with all the different models that you can load in stable diffusion MJ is 1000 times better at natural language parsing and understanding, and requires significantly less prompt crafting to be able to get aesthetically pleasing results.
Having used automatic1111 heavily with an RTX 2070, the only area I'll concede SD can do a better job is in closeup Headshots and character generation. MJ blows SD out of the water where complex prompts involving nuanced actions are concerned.
Once midjourney adds controlnet and inpainting to their website that's pretty much game over.
depending on what you want, you can actually get images that are pretty nice.
i'm using it to generate abstract art and i've seen worse in the real world
I still think that Midjourney is hamstringing themselves by being Discord-only. And their keyword nannying is pretty bad. It a testament too their overall quality that they're still as popular as they are are, but I really don't think they are doing themselves any favors, especially as the Stable Diffusion ecosystem continues to grow.
This isn’t as true as it sounds, ex. stable diffusion can do better but requires in depth practice and experience.
For your average user, DallE is easy, MJ is fairly disorienting, and SD requires a technical background. I agree with you completely no one serious is doing art with DallE.
I would have said same as you until I tried integrating SD vs. DallE APIs, I desparately want SD because it’s easily 1/10th the cost, but it misses the point much more often. Probably gonna ship it anyway :X
You don't need a technical background at all really. We've also got something cooking that does prompt tuning in the background so there's less prompting needed from the user.
Do you have any recently updated examples, blog posts, whatever showing that DALLE is worse than modern stable diffusion? I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing), with the downside of it being locked away and somewhat expensive. And my understanding is that stable diffusion 2.0+ is actually a step backwards in terms of quality, especially for anything involving images of humans. But as this thread acknowledges, this area is moving very quickly and my knowledge might be out of date, so definitely happy to see some updated comparisons if you have any to suggest. It feels like ever since Chat GPT came out, they haven’t been many posts about stable diffusion an image generation, they got crowded out of the spotlight.
If you want an example, go check out DALLE2 subreddit vs SD subreddit.
The former is a wasteland, the latter is more popular than r/art (despite having 1% of subscribers, it has more active users at any given moment)
If you want something ready to use for a newbee, midjourney v4 crushes DALLE2 on both prompt comprehension and the images look far more beautiful.
If you are already into art, then StableDiffusion has a massive ecosystem of alternate stylized models (many which look incredible) and LORA plugins for any concept the base model doesn't understand.
DALLE2 is just a prototype that was abandoned by OpenAI, their main business is GPTs, DALLE was just a side hustle.
Dall-E is more likely to generate an image that to some degree contains what you asked for. It also tends to produce less attractive images and is closed so you can't really tune it much. People mostly don't try to do completely whole cloth text to image generation with stable diffusion, for anything involved they mostly do image to image with a sketch or photobashed source. With controlnet and a decently photobashed base image you can get pretty much anything you want, in pretty much any style you want, and it's fast.
> I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing),
“Artistically pleasing” is often what people ask for.
> with the downside of it being locked away and somewhat expensive.
Those are enormous downsides. Even if DALL-E was better in some broadly relevant ways in the base model, SD’s free (gratis, at least) availability means the SD ecosystem has finetuned models (whether checkpoints or ancillary things like TIs, hypernetworks, LORAs, etc.) adapted to... lots of different purposes, and you can mix and match these to create your own models for your own specific purposes.
A web interface backed by strictly the base SD model (of any version) might lose to the same over DALL-E for uses where the set of tools in the SD ecosystem do not.
I don’t disagree about the downside of DALL-E being locked away and expensive. It’s been exciting to see the Cambrian explosion of improvement to stable diffusion since its initial release. This is how AI research should be done and it’s sad that “Open AI” is not actually open.
That being said, for a business use cases, where I want to give it a simple prompt and have a high chance of getting a good usable result, it’s not clear to me that stable diffusion is there yet. Many of the most exciting SD community results seem to be in anime and porn, which can be a bit hard to follow. I guess the use cases that I’m excited about are things like logo generators, blog post image generators, product image thumbnail generators for e-commerce, industrial design, etc.
But please prove me wrong! I’m excited for SD to be the state of the art, it’s definitely better in the long term that’s it’s so accessible. I‘m sure a good guide or blog post about what’s new in stable diffusion outside of anime generation would be an interesting read.
DALLE2 is underpowered and has never improved since they released it. The actual quality of the images is very low (literally in the sense of they have lots of artifacts) because they saved CPU time by not running enough diffusion passes.
People usually still use SD v1.5 because of the experience that people have with finetuning and merging with it. Also a lot of LoRA are trained for v1.4/1.5 models and they wouldn't work with v2.1, of course you also have incredible capability to control the generation with SD and this helps, to see some result: https://youtu.be/AlSCx-4d51U
Dalle 2 was great initially but the SD BLEW past it. I mean way way way past it. Dalle2 is like a Model T Ford and SD is a Fighter Jet. It's that different. Dalle-2 is dead already.
I love that there are so many options that people disagree about which is best. THAT is probably the worst thing that can happen to OpenAI - not just one competitor, but a whole heap of them.
I must be horribly out of date then - I thought Midjourney was the cut down DALL-E approximation, created to givr something to play with to people who couldn't get on the various waiting lists, or can't afford to run SD on their own.
My company has a team of AI-enpowered artists who would overwhelmingly disagree with you on the premise that AI art is not art. Maybe you're the only one doing the laughing.
A lot of online "artists" are mad about it. Generally not professionals who actually need productivity, but semipros who live off one-off commissions or else people who are just generally mad at tech bros.
Midjourney is still competitive, but mostly because its easier to use.
Dalle2 will get you laughed out of the room in any ai art discussion.