Why Can't AI Draw Hands?

AI art has become pretty good at handling very specific requests these days. The only telltale sign of AI nowadays is the grotesque, deformed hands that AI can’t seem to handle. So, why can’t AI draw hands?

AI can’t draw hands primarily because it doesn’t have an inherent understanding of how hands work. Hands have a lot of moving joints, and the skin wrinkles when it touches an object or another hand. Moreover, AI struggles to define how and where hands should be placed in images.

Let’s go over all the different reasons why AI doesn’t understand how to draw hands. I’ll also talk about some possible solutions engineers are working on to teach AI how to draw realistic hands.

What Makes AI So Bad at Drawing Hands?

AI is bad at drawing hands because they’re an incredibly complex body part comprised of over a hundred ligaments. Moreover, most of the current data that AI image generators were trained on is poorly annotated, which is a challenge for effective machine learning.

To better understand why AI sucks at drawing hands, let’s briefly go over the biomechanics that make hands so tricky to draw.

Namely, a hand has 27 joints and 34 small muscles, and any tiny movement affects how the hand looks.

Professional artists learn how to draw hands by studying them in depth. They also sketch a simplified version before drawing it and adding texture and shading.

More importantly, an artist has an intuitive understanding of how a hand works and interacts with the world.

AI image generators are pretty good at creating a realistic skin texture. What they really struggle with is understanding how muscles and joints in the hand move and interact with objects.

Let’s go over the main challenges that AI faces when trying to draw hands.

1. Limited & Poorly Annotated Data

Arguably the biggest obstacle that AI image generators face today is the type of data they use.

To put it in simple terms, AI uses deep learning algorithms in conjunction with previously-labeled images. These images allow AI to recognize patterns and learn what things look like through deep learning.

Here are just a few types of labels images need to have for effective deep learning:

Transcription
Object interpolation
Bounding boxes
Polygons

For this discussion, I’ll focus on transcription, which matters most with current AI image generators.

For example, you can find thousands of images of people holding coffee cups. Everybody has different hands, and a cup can be held in many different ways.

Most of these images are simply labeled as “man holding green coffee cup” or “old lady drinking from a mug.”

Meanwhile, AI would learn much faster if images had specific descriptions. For example,”man holding green coffee cup with index finger underneath handle and thumb over handle…” You get the idea.

There aren’t many image repositories with precise labeling that AI would need to learn how to draw hands.

Moreover, deep learning isn’t perfect. A neural network might start associating hands with feet, and then there’s no turning back.

You may have already seen some images where AI mixes and matches toes and fingers for some hideous results.

The way things are now, AI might be able to draw very convincing-looking nails. It’ll know how to draw minor details like the lunula, cuticle, free edge, etc.

However, it won’t understand that nails aren’t visible from the palm side of the hand. Midjourney, for instance, loves to create images of hands holding stuff with rotated and bent fingertips.

There’s also the question of legality. Several AI companies have been accused of using copyrighted images to train their AI on without permission.

2. Computational Limitations

Another issue regarding data has to do with processing power. Machine learning is generally done with GPUs because they can process many smaller tasks significantly faster than a CPU.

We have some relatively fast graphics cards, but there’s a lot of demand from multiple industries.

So, only a tiny portion of graphics cards ends up being used for machine learning.

And since images take significantly more data, space, and computational power than text, the progress is inevitably slower.

Hands are just one smaller target of these all-purpose AI tools. Ideally, AI would use realistic 3D models of hands in conjunction with 2D images to learn what hands act and look like. But such advances are still in the far future.

3. How Hands Act and Interact

Tying into the previous point, hands can do thousands of different things. Shaking hands is a notorious obstacle for AI, for one.

AI knows that fingers wrap around the back of the hand, but it doesn’t know when to stop adding fingers.

As I mentioned previously, AI especially has trouble understanding fingers and fingertips. The amount of pressure used to hold or press an object also affects how realistic an image looks.

Try using AI to generate an image of somebody using a keyboard or playing the piano. There’s a good chance that the fingertips will be chopped off or bent in a weird and funny way.

When you think about it, there are a lot of subtleties of how hands move and interact with the world. AI regurgitates bits and pieces of different images of hands to make new images.

4. Hands Can’t Be “Good Enough”

This one is a bit of our own fault. Admittedly, hands with 12 fingers are pretty easy to spot. However, sometimes AI can draw some decent-looking hands that look realistic at first glance.

But maybe the nail tips are missing, there are too many tiny wrinkles, or the hand has too many folds and veins. Sometimes, the wrist is too long, or the gap between the thumb and index finger is too wide.

The point is, it doesn’t take much for AI-generated hands to look bad. This isn’t the case with inanimate objects like plushies and pillows. We look at our hands and use them all the time, so we know precisely what they look like. If just one bit is off, you’ll intuitively know that something is wrong.

You can learn more about these reasons by watching this YouTube video from Vox:

Will AI Get Better at Drawing Hands in the Future?

To be fair to AI, it took humans until the Renaissance and Da Vinci’s Study of Hands to learn how to draw hands. So, could AI do the same?

AI will get better at drawing hands in the near future. Tech companies are constantly improving their deep-learning algorithms and image annotation databases. As hands need to be labeled correctly, AI-generated hands will significantly improve.

One of the most promising results comes from Midjourney 5. It seems to have mostly figured out that an average hand has 4 fingers and a thumb, not more, not less.

It’s also much better at figuring out how hands wrap around objects.

While researching, I looked at some recent examples of photos made using Midjourney. Images of hands out of context looked highly convincing. The texture of the skin and the small wrinkles are almost photorealistic.

However, AI still has significant difficulties with hands doing any sort of gestures. Shaking hands is still an impossible task, let alone more complex gestures like forming a heart shape.

Still, given how fast AI is learning, we’ll likely see generated hands indistinguishable from real hands within a few years.

Besides, a lot of images generated by AI serve as a good basis for realistic images. People skilled in photo editing can easily fix weird-looking hands in Photoshop.

Final Thoughts

One of the biggest obstacles in AI image generation as of right now are hands. After all, hands are complex body parts that interact with other objects in millions of different ways.

With the help of human-annotated data and feedback, so it’s only a matter of time before AI conquers them as well.

Why Can’t AI Draw Hands?