Back

Nvidia releases Paint me Picture – A web app for GauGAN2

167 points92 comments12 days agoblogs.nvidia.com
by akersten12 days ago

I really don't have anything constructive to say. I think in general we're getting too soft on shitty things, so I'm going to be harsh.

I clicked through to the demo site ( http://gaugan.org/gaugan2/ ) and it was horrible.

The interface is clunky, slow, and confusing. I actually had to zoom out in my browser to see the whole thing. Had to click through a non-HTTPs warning. The onboarding tutorial is pretty bad.

I got a generic picture of the milky way for any prompt I tried ("rocks", "trees"). If you press Enter in the prompt field it refreshes the page.

This feels like a hackathon front-end hooked up to an intro to PyTorch webservice. It's only neat because, unlike the other 20 copies of this same project I've seen, it was the only one that didn't immediately throw its hands up and say "server overloaded, please wait."

If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.

[0] https://imgur.com/BNLDt6A

by onion2k12 days ago

Your comment is a really good example of how startups can fail.

Here we have a web app that does something very clever, built by great devs, that fails because it doesn't work the way users expect it to.

So many highly intelligent, super technical founders have a belief that their amazing tech will sell itself, so they don't put the time and effort necessary in to design, UX, or marketing, and they fail because their UI didn't make it clear what to do. It probably works brilliantly when they demo to people, with the authors driving it or helping users get the most from it. But when users have to use it without that help... It fails.

The lesson for founders here is simple - test your UX, because you won't get a second chance with most customers.

by phist_mcgee12 days ago

The app I used worked nothing like the slick demo in the video. In fact, the UX and UI are some of the worst I have used in recent memory.

No matter what some backend folks believe, there will always need to be highly skilled front end engineers who can put together web apps in a way where the interface just 'gets out of the way' so you can focus on the actual utility.

by hutzlibu12 days ago

"The app I used worked nothing like the slick demo in the video."

This point is very important and I hope to not do the same misstake.

Because I also watched the video, saw what was happening there, looked nice - but trying it for myself absolutely did not work as expected.

In other words, if there is a simple video with features shown - then trying it out needs to be as simple as the video, or it causes lots of frustration.

First time users do not want to deal with setup configs etc. first. This is something you want later.

by akersten12 days ago

After a sibling commenter very patiently pointed out that I was holding it wrong - I would encourage taking another look at this project if only to try out the Painting mode.

It does produce some very cool results: https://imgur.com/LQuo4UM

Had the press release/tutorial emphasized this angle instead of the wonky text-to-image thing, my initial impression would have been a lot better. This is genuinely a really neat feature. All my UI and discoverability criticism stands though!

by ma2rten12 days ago

The text-to-image thing is cool as well, as long as you can a) figure out how it works b) only enter landscape terms.

This comment explains how to use it: https://news.ycombinator.com/item?id=29338213

by FpUser12 days ago

I managed to produce landscape - canada fall colors red yellow green and bright blue sunny sky.

No matter what variations I tried the sky was cloudy dark. Trees however were majestic. The UI does suck.

+1
by ma2rten12 days ago

Your query worked for me on first try: https://imgur.com/a/iTKwZ3v

by azalemeth12 days ago

Type in "cat" into the text box and see a wonderful variety of landscapes that look like they are straight from the surface of some sort of plane of hell. Or furry flowers. Or fruit with eyes. I was expecting "lion in the Savannah" type pics, not "a visualisation of my DND group's "baleful polymorph" spell"...

by pmontra12 days ago

Yes, whatever I type I get some abomination in the general settings I asked for (hills, mountains, ocean.) And I agree that the UX is also horrible. I want the webapp they are showing in the video, not the one they have online.

by geoduck1412 days ago

>I got a generic picture of the milky way for any prompt I tried ("rocks", "trees").

The picture is just really zoomed out. That's how awesome it is - it shows you ALL rocks and trees.

/s

by dom9612 days ago

What if this is done on purpose? If they make the UX too easy then since this tech is so impressive it will be shared across the general population and quickly the service will get overloaded. This way only those that truly have the patience to fiddle with controls have the ability to get it working.

by Imnimo12 days ago

It looks like you have selected for it to use a segmentation mask, and not to use text.

by akersten12 days ago

I have no idea what that means, but the fact that it both told me to enter a text prompt and actually let me do it while not being in whatever magical mode it should have been in in order to actually use the text prompt is another point that can be added to my above rant.

Alright, I've uttered the incantation for it to do the thing. I still don't get it. [0] https://imgur.com/4zbaiH0

I also tried another example prompt, which bared a striking similarity to the previous result. I don't know if it's persisting the result (It shouldn't - I didn't click the re-use image button), but the strange life-raft looking artifact is very persistent. [1] https://imgur.com/KTCM4xH

by Imnimo12 days ago

Yeah, the text-to-image seems to be highly dependent on whether the generator knows how to generate the specific objects the text model thinks should be in the image. I got much more consistent results using the semantic segmentation drawing as input:

https://imgur.com/QC13zml

(and for what it's worth, you're totally right that the UI is just an absolute disaster)

+1
by akersten12 days ago

The picture you drew and had it turn into rocks is actually really cool!

I think I would have been more generous to the project had I known it could do that. Maybe I X'd out of the frustrating tutorial too early? :)

by danielvaughn12 days ago

Wow, you weren't kidding. What an awful interface.

by randyrand12 days ago

Hackernews has never gone soft on anything. Quite the opposite.

by IshKebab12 days ago

Yeah Hackernews is almost universally critical. Even his comment is needlessly critical. nVidia isn't poising this as a polished consumer app. It's a demo! The URL is https://www.nvidia.com/en-us/research/ai-demos/

For a research demo the UI is extremely polished. It even has an interactive tutorial!! Some people...

by speedgoose12 days ago

I worked in research in human computer interactions, this is a very poor UI even in a research setting. The technology is impressive though.

+1
by belval12 days ago

Your parent comment didn't mean that this was a good UI for human-computer interaction research, he meant that it was a good UI considering it was probably just a quick demo built by the scientists who did the research to showcase their work.

I work in a research group and most of my colleagues would have a hard making an interface like this one, no matter how confusing. They are ML/DL scientists and usually have had absolutely 0 exposure to frontend.

by A4ET8a8uTh012 days ago

<<If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.

Let me put my tinfoil on. Maybe the idea is to normalize and make it less scary. Look at the press releases. The general response was not a horrified 'omg, this tech has gone berserk', but 'oh, its a benign lil biddy'.

Tinfoil off.

The interface is absolutely atrocious.

by EMM_38612 days ago

I bailed on it also, with the same experience.

A barrage of obtrusive prompts. When it wanted me to enter a phrase I typed "sunset over a field" and got a picture of the Milky Way galaxy also, followed by more prompts.

I then closed the tab and came here to read the comments.

by bamboozled12 days ago

It just did absolutely nothing for me, I typed in words and hit enter, just a green screen.

by varelse12 days ago

I'll say something constructive. While the text to image functionality is terrible, the segmentation based on arbitrarily drawn images is kind of fun. They should have just gone with the latter.

by thecleaner12 days ago

This is a demo piece not a paid app. Nvidia has zero interest in selling web apps. This is just a marketing stunt to show off their capabilities so they can sell hardware.

by Rd6n612 days ago

I don’t know how to make sense of the tos for projects like this. They all have clauses that let them modify their terms later for example

by martinko12 days ago

> I got a generic picture of the milky way for any prompt I tried You need to uncheck "segmentation" and check "text"

by iaml10 days ago

> If you press Enter in the prompt field it refreshes the page.

That's just default browser behaviour for forms.

by csomar12 days ago

I'm still clicking on the next button.

by eis12 days ago

I am just getting very weird results that don't look at all like the one in the demo video. Here for example the image it gave me for "car in front of house" https://i.imgur.com/QdtrtCR.png

Or how about this one for "dog playing with ball" https://i.imgur.com/ldGLdwF.png

I have tried about a dozen different input phrases and every time I get these very strange results.

by thom12 days ago

Yeah, it can draw anything you want, as long as what you want is mountains, trees and lakes.

by codefreakxff12 days ago

Cars and houses don’t sound like landscape. Maybe they should put some filters for non landscape input

by eis12 days ago

I missed the part where it said it was trained only on landscapes. So I retried it with just those and got this:

"river flowing through desert": https://i.imgur.com/QSjH5hk.png

"sunset waterfall": https://i.imgur.com/wS3EEci.png

Or how about a lovely "green shoreline": https://i.imgur.com/RkbTV99.png

by CornCobs12 days ago

2nd one is pretty dope. The 3rd one is rather uncanny to me though

by guerrilla12 days ago

Well, you did better than I did. I only get pictures of stars and galaxies regardless of what I input, even restricting it to landscapey words.

by anigbrowl12 days ago

Those are at least weird and interesting. I tried 'dog running by a river' and it just kept giving me astronomical images ¯\(°_o)/¯

by prezjordan12 days ago

Make sure "Input utilization" is set to "Text" if you're entering a text prompt.

by dd444fgdfg12 days ago

what? are you saying they showcased the best results? that's unheard of.

by eis12 days ago

It's not just a matter of showcasing the best results. It's a matter of night and day difference between normal output and the one showcased. They are not even remotely close. I actually wonder why they decided to release this to the public in its current form. I was very impressed by the noise suppression app they released so I expected something that delivers decent results.

by throwoutway12 days ago

The first image would make an excellent /r/writingprompt

by visarga12 days ago

You broke it!!

by nitred12 days ago

I wonder if in a decade or so, large tech companies will unseat Disney, Warner Brothers etc as creators of animation movies.

While the results on Nvidia's website aren't too impressive, if you look at the history of animated movies [1], one can see how trivial and simplistic the art and animation was.

Having had some experience doing some research on GANs at university, I know them to be very powerful. What's very important to note is that the images generated my the model are truly "novel" i.e. completely fictitious. The images generated may be biased to some of the training data such as color and texture of the water and rocks for example, but every image is a fantasy of the model. The only way the model can generate such realistic images is because it has a very good abstract internal representation of what oceans, waves, rocks are.

Back at university, I pitched the idea to my professor of using GANs for generating "novel" images in real time while parents would read bed time stories to children. I didn't get very far. Glad to see some real progress in that direction.

[1] https://www.filmsite.org/animatedfilms.html

by f0e4c2f712 days ago

Close, but it will actually turn out to be a very small company (possibly in less than a decade).

Hollywood has little awareness of just how much danger the legacy version of their industry is in.

ML generated assets are slowly creeping towards reality and at the same time doing 3D dev is 100-1000x easier than it was just a few years ago. It's now possible to do for free in many cases as well.

by pjmlp12 days ago

Hardly, Disney now owns Pixar after the early 3D competition days, they can as easily buy another concurrent.

by teddyh12 days ago

That’s “competitor”.

by pjmlp12 days ago

Right, thanks.

by stefan_12 days ago

This seems to be the link: http://gaugan.org/gaugan2/

by eis12 days ago

A tip for people who get lost in the interface:

  1. Just close the tutorial
  2. Scroll down to the ToS checkbox and check that
  3. On top in the "input utilization" row make sure only text is checked
  4. Enter your text (use only landscape terms)
  5. Press the arrow to the right inside a rectangle button located below the text input.
  6. Maybe zoom out a bit because the result will be the image on the right which for me was out of view by default. Had to zoom to 50% to see the whole UI.
by mcintyre199412 days ago

Also definitely don't hit enter like you do in every other form, because that seems to clear your input, swap the "input utilisation" back to only segmentation and sometimes also unchecks the ToS checkbox.

by saberience12 days ago

I only said lakes, mountains etc, and only got galaxy/star pictures...

https://imgur.com/a/SiQwBA2

by Porygon12 days ago

You have to

1. uncheck "segmentation"

2. check "text"

https://i.imgur.com/n9P8N3c.png

by ma2rten12 days ago

Thanks! This UI is something else...

by carrolldunham12 days ago

I finally got through to the demo through three links and it's so busted in so many ways for me that I give up. Maybe it's stupid to try with my old netbook but I don't get any indication of whether I need a fancy graphics card for it to work or if it's running on my end. Anyway

-the screen zooms around disorienting for the tutorial and I get to congratulations you made your first image - there's nothing there.

-Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry

-The whole site is a fixed width which is wider than my screen

- A red alert check box at the bottom confuses me about whether that's why it's not working etc

by webmaven12 days ago

> Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry

That tripped me up too. After typing in your text, don't hit enter or anything similar, just click or tap the button with the right arrow (or anything to the right of that button, effects vary).

by Turing_Machine12 days ago

More or less the same, and I have a nearly-new M1 iMac, so it's not your old netbook.

Chrome appeared to work somewhat better than Brave, but it was still pretty frustrating.

I did manage to get it to work in a half-assed manner eventually, but the UI definitely needs a great deal of work.

by poniko12 days ago

The UI is horribel .. they don't have one single ux person who have 2 hour to spare that could help out at nvidia ...

by blt12 days ago

I entered "kitten" and got typical surreal GAN output with disconnected topology, dozens of eyes, etc.

Edit: Looks like it was only trained on landscape images.

by Baeocystin12 days ago

I like what it did with colorless green ideas sleeping furiously. It fits the mood of the sentence.

https://i.imgur.com/jPd0QLE.png

by saberience12 days ago

I entered "mountains" and "mountains and lake" and only got pictures of what looked like blurred galaxies and stars. Clicking any of the style buttons got me colored/tinted pictures of stars. Is it broken?

https://imgur.com/a/SiQwBA2

by luegen12 days ago

Your segmentation map contains "sky" by default. Try drawing "mountain" color in the lower part of the image first.

by Tade012 days ago

MtG dual-color lands serve as a good source of ideas on what to put in the textbox:

https://www.mtglands.com/coloridentity-dualcolor.html

As for the UI: layout via tables?

by xdfgh111212 days ago

Feeding all magic cards into a GAN would be a great way to generate new ones.

by blackoil12 days ago

I am impressed, obviously still a tech. demo but I like the future. text to image isn't awesome but segmentation worked beautifully

https://imgur.com/a/VSv9ZbA

by greenseagull2112 days ago

The comments are overwhelmingly critical of the user interface, which is undoubtedly the weak part of this release, but I was still able to get some very impressive results.

An AI generated house on a lake: https://imgur.com/a/0wtVKum

I have found the best results come from uploading an image, then using the demo tools to get a segmentation map and sketch lines, then editing those as you desire. Changing the styling at the end also makes a big difference!

by edumucelli12 days ago

I input "a horse", it gives me this: https://imgur.com/1coGQix

by dt3ft12 days ago

Behold, AI. I think our dev jobs are safe for the next 100 years at least.

by bliss12 days ago

I sit here quite impressed with my Pseudo 50s SciFi book cover

https://imgur.com/a/NZuGTXU

However... This gallery on imgur gives a better idea of capability https://imgur.com/gallery/coWN44P

by jeroenhd12 days ago

The UI for the demo is atrocious, but that's probably because the text-to-image generation was glued to their existing AI painting tool.

I'd love for just the algorithm generation tool to be available for download. The web UI is clunky and just doesn't seem to work right.

by system212 days ago

After trying 30 minutes I kind out understood some stuff. But the ui is legit anxiety inducer. I hope they can fix the ui to make it fun. Currently felt like using 80's DOS graphic software with so much manual input.

by wodenokoto11 days ago

Chose building->house, segmentation and text, wrote "skyscraper" as text and drew some lines of a silhouette of a skryscraper.

Returned an image of stars in outer space.

by King-Aaron12 days ago

Trying to do everything that the tutorial and other commenters have suggested, but when I click the arrow button it just waits for a few moments and nothing is generated. Am I missing something?

by ribit12 days ago

Are there any open-source models that can do similar type of landscape generation? I would really like to look at the code and try to understand how these things are built...

by quitit12 days ago

This might be a good start: https://thisbeachdoesnotexist.com

There are a whole series of "this<blank>doesnotexist" E.g. landscapes, faces, animals, etc.

by thuccess12912 days ago

Nvidia should visualize a periodic 24 hour 3D landscape sweep of the chatter on social media platform for metaverse dive through and interactive engagement.

by monkeydust12 days ago

Wow. I have a good degree of respect for Nvidia but this should never have been released in the state it is. Whos the product manager for this?

by AbuAssar12 days ago

Link to actual demo: http://gaugan.org/gaugan2/

by stunt12 days ago

It seems that the web UI was generated by AI too, because it's really hard to make sense of it.

by Havoc12 days ago

What license are the produced images under? I could see this being used for cheap stock photos

by bradhensen12 days ago

Now, it's not an imagination if you can create a visual art using texts.

by savant_penguin12 days ago

To whomever came up with this name, good job

by Porygon12 days ago

This sounds quite ironic to me, since "Super-GAU" in German stands for a disaster beyond all expectations (usually meltdown of a nuclear reactor).

https://de.wikipedia.org/wiki/Super-GAU

Another unfortunate name: "Gauleiter" was a regional leader of the Nazi Party.

https://en.wikipedia.org/wiki/Gauleiter

by Tade012 days ago

"gaugan" sounds like the word for "rag" in Polish.

by togaen12 days ago

why

by h2odragon12 days ago

I hear they have a miraculous new AI tool that magically determines your sexual desires then uses lasers to induce those feelings through your eyeballs with no contact necessary! Coincidentally demonstrated at this very same URL!

Even then, I don't think I'd care enough to fight through the layers of bullshit here.