It is New Year’s Day 2023 . Happy New Year!!! I am currently driving with my family coast-to-coast on a road trip through the United States, but for New Year’s Eve and New Year’s Day we stayed in one place. Taking advantage of the driving free days, I and my 4-year old son had some great fun with the open-source stable diffusion models; in particular, the Text-Guided Image Inpainting techniques.

Basically, inpainting allows you to replace or transform image areas of your choice with something else AI-generated based on a text prompt. You can see some of my results in the collage above. The top left panel shows the original (real) image. That’s a photo I took of my son during breakfast at a restaurant this morning, and he found it absolutely hilarious how we can drastically modify it with the computer – the text prompts we used were based on his suggestions to a large part.

## A few code snippets

I already had played around a few times with image generation with stable diffusion in Python, and with textual inversion for representation of a specific artistic style. Immediately I was (and still am) positively surprised by how easy and pleasant the developers made it to use stable diffusion via the Huggingface diffusers library in Python. But I haven’t looked at inpainting techniques until today. I learned a lot from great tutorials about stable diffusion such as the FastAI notebook “Stable Diffusion Deep Dive”, but I haven’t specifically seen examples of inpainting so far (though I haven’t looked ). So, I’m providing some relevant code snippets here.

There are two clear ways in which inpainting could be applied to the image I started with (top left in the collage above). Either replace/transform the boy, or replace/transform the drawing that he is holding.

However, first, one has to define an image mask:

• Because I didn’t want to stress about it, I simply guessed by eye rectangular image areas to be masked, for instance as follows (note that I used somewhat different masks for different text prompts):
mask = np.zeros(init_image.size).T


### Generating the selected image areas based on a text prompt “from scratch”

• The chosen image areas can be generated from scratch, in which case I used the stable diffusion v2 inpainting model. Here is a corresponding code snippet to download and initiate the pre-trained models and other components of the diffusion pipeline:
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-inpainting",
revision="fp16",
torch_dtype=torch.float16,
).to("cuda")
pipe.enable_attention_slicing()  # to save some gpu memory in exchange for a small speed decrease

• Before applying the models, I resized and square-padded all images to 512x512 pixels (I saw the recommendation for square-padding in someone else’s stable diffusion inpainting code, I don’t remember where exactly, and didn’t do any experiments without square-padding).

• Using the above model, I was able to generate images with code like:

import torch
torch.manual_seed(2023)

inp_img = square_padding(init_image)  # my own function, init_image is loaded with PIL.Image
inp_img = inp_img.resize((512, 512))

prompt = "something..."
negative_prompt = "something..."

num_inference_steps = 50, guidance_scale = 11).images
result[0]  # this is the generated image


### Generating selected image areas in an image-to-image fashion

Alternatively, the generated image can be created in an image-to-image fashion. For this, I adapted an example from the huggingface/diffusers repository, along the lines of:

from diffusers import DiffusionPipeline
import torch

torch.manual_seed(2023)

inp_img = my_input_image  # loaded with PIL.Image
inner_image = inp_img.convert("RGBA")

pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
custom_pipeline="img2img_inpainting",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()  # to save some gpu memory in exchange for a small speed decrease

prompt = "something..."
negative_prompt = "something..."

result = pipe(prompt=prompt, image=inp_img, inner_image=inner_image,