万字长文解读Stable Diffusion的核心插件—ControlNet

[复制链接]
查看802 | 回复7 | 2023-8-16 14:21:19 | 显示全部楼层 |阅读模式
目录
一、介绍
二、使用方法
三、ControlNet结构
1.整体结构
2.ControlLDM
3.Timestep Embedding
4.HintBlock
5.ResBlock
6.SpatialTransformer
7.SD Encoder Block
8.SD Decoder Block
9.ControlNet Encoder Block
10.Stable Diffusion
四、训练
1.准备数据集
2.生成ControlNet模型
3.执行训练
五、其它
1.损失函数
2.随机替换提示
3.支持低资源设备


一、介绍

        论文地址:https://arxiv.org/abs/2302.05543
        代码地址:GitHub - lllyasviel/ControlNet: Let us control diffusion models!
        扩散模型(Diffusion Model)的主要思想是通过去噪的的方式生成图片,训练过程是每个时间步,将不同“浓度”的噪声掺到原图片,然后将时间步(timestep)和掺了噪声的图片作为输入,模型负责预测噪声,再用输入图像减去噪声然后得到原图。就像米开朗基罗说的:塑像本来就在石头里,我只是把不需要的部分去掉。这也是为什么在使用Stable Diffusion时Sampling steps不是越大越好的原因,这个值需要跟当前噪声图片所处的时间步相对应。
        ControlNet在大型预训练扩散模型(Stable Diffusion)的基础上实现了更多的输入条件,如边缘映射、分割映射和关键点等图片加上文字作为Prompt生成新的图片,同时也是stable-diffusion-webui的重要插件。ControlNet因为使用了冻结参数的Stable Diffusion和零卷积,使得即使使用个人电脑在小的数据集上fine-tuning效果也不会下降,从而实现了以端到端方式学习特定任务的条件目的。
ControlNet主要有两点创新:
1.使用Stable Diffusion并冻结其参数,同时copy一份SDEncoder的副本,这个副本的参数是可训练的。这样做的好处有两个:
        a.制作这样的副本而不是直接训练原始权重的目的是为了避免在数据集很小时的过拟合,同时保持了从数十亿张图像中学习到的大模型质量。
        b.由于原始的权值被锁定了,所以不需要对原始的编码器进行梯度计算来进行训练。这可以加快训练速度;因为不用计算原始模型上参数的梯度,所以节省了GPU内存。
2.零卷积 :即初始权重和bias都是零的卷积。在副本中每层增加一个零卷积与原始网络的对应层相连。在第一步训练中,神经网络块的可训练副本和锁定副本的所有输入和输出都是一致的,就好像ControlNet不存在一样。换句话说,在任何优化之前,ControlNet都不会对深度神经特征造成任何影响,任何进一步的优化都会使模型性能提升,并且训练速度很快。
二、使用方法

        项目中提供了很多功能比如:线图成图、分割图生图、pose生图等,使用方法大同小异,我们以线图成图为例。
        下载预训练模型,地址:lllyasviel/ControlNet at main,下载control_sd15_canny.pth模型,放到models目录。
        执行下面命令,即可启动项目,:
  1. python gradio_canny2image.py
复制代码
        启动后访问 http://127.0.0.1/7860,打开页面如下:

         第一个红框上传图片、第二个红框填写prompt,只支持英文,等待一会右侧会生成两张图,一个是根据原图生成的Canny图,另一个是根据Canny图和prompt生成的结果图,可以看到模型理解了我的意思生成的女孩头发是紫色,至于漂不漂亮就仁者见仁智者见智了。
        点击Advanced options会出现附加选项,我简单介绍一下每个选项的意思:

Images:生成几张图片,如果调很大,小心爆显存。
Image Resolution:生成的图片分辨率。
Control Strength:下面会介绍到,ControlNet分成Stable Diffusion和ControlNet两部分,这个参数是ControlNet所占的权重,当下面的Guess Mode未选中ControlNet部分的权重全都是这个值;如果下面的Guess Mode选中,在ControlNet部分每层(共13层)的权重会递增,范围从0到1。递增的代码如下,注释挺有意思:
  1. # 位置 gradio_canny2image.py
  2. # Magic number. IDK why. Perhaps because 0.825**12<0.01 but 0.826**12>0.01
  3. model.control_scales = [strength * (0.825 ** float(12 - i)) for i in range(13)] if guess_mode else ([strength] * 13)  
复制代码
Guess Mode:不选中,模型在处理Negative Prompt部分时,Stable Diffusion和ControlNet两部分全有效;选中,在处理Negative Prompt部分时,只走Stable Diffusion分支,ControlNet部分无效。代码分两部分:
  1. # 位置 gradio_canny2image.py
  2. ...
  3. un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
  4. ...
  5. # 位置 cldm/cldm.py
  6. if cond['c_concat'] is None:
  7.             eps = diffusion_model(x=x_noisy, timesteps=t, context=cond_txt, control=None, only_mid_control=self.only_mid_control)
  8.         else:
  9.             # ControlNet()  # 位置
  10.             control = self.control_model(x=x_noisy, hint=torch.cat(cond['c_concat'], 1), timesteps=t, context=cond_txt)
  11.             control = [c * scale for c, scale in zip(control, self.control_scales)]
  12.             # ControlledUnetModel()  # 位置
  13.             eps = diffusion_model(x=x_noisy, timesteps=t, context=cond_txt, control=control, only_mid_control=self.only_mid_control)
复制代码
Canny low threshold:Canny的参数,如果边缘像素值小于低阈值,则会被抑制。
Canny high threshold:Canny的参数,边缘像素的值高于高阈值,将其标记为强边缘像素。
Steps:执行多少次“去噪”操作。
Guidance Scale:正向prompt所占比重,下面代码中的unconditional_guidance_scale就是这个参数,model_t是正向prompt+Added Prompt生成的特征,model_uncond是Negative Prompt生成的特征:
  1. # 位置 cldm/ddim_hacked.py
  2. model_output = model_uncond + unconditional_guidance_scale * (model_t - model_uncond)
复制代码
Seed:生成噪声图时的随机种子,当这个值一定,其他条件不变的情况下,生成的结果也不变。
eta (DDIM):DDIM采样中的eta值。
Added Prompt:附加的正面prompt,比如best quality, extremely detailed
Negative Prompt:附件的负面prompt,如果生成的图不满意,哪部分不满意可以写在这里,比如longbody, lowres, bad anatomy
三、ControlNet结构

        ControlNet官方给出的结构图如下:


        这个图总体概括了ControlNet的结构,但是很多细节没有表现出来,通过阅读代码我给出更加详细的模型结构介绍。项目中训练使用的数据输入512x512,我为了能区分宽高,使用1024x512的输入,我们以canny2image为例,并且batch_size=1。
1.整体结构

       模型整体结构图如下:

        模型输入包括canny图(Map Input)、Prompt、附加Prompt(Added Prompt)、负面Prompt(Negative Prompt)、随机图(Random Input)。
        Prompt和Added Prompt连个字符串拼接到一起经过CLIPEmbedder得到文本的空间表示(两个FrozenCLIPEmbedder共享参数),然后与Map Input、Random Input一同送入ControlNet的核心模块ControlLDM(Latent Diffusion),然后循环20次(对应页面参数Steps),其中timesteps每个时间步不一样,以Steps=20为例,timesteps分别等于[1,51,101,151,201,251,301,351,401,451,501,551,601,651,701,751,801,851,901,951]。
        Negative Prompt也做类似操作,然后将Prompt和Prompt的输出做加权,公式如下,其中GuidanceScale为页面参数,默认9:

        最后经过Decode First Stage还原成原图片大小。
2.ControlLDM

       ControlLDM是ControlNet的核心模块,结构图如下:


        ControlLDM整体结构还算清晰,数据流转主要流程如下:
a.timesteps经过embedding转换为特征向量送入Stable Diffusion和ControlNet;
b.随机噪声被送入Stable Diffusion;
c.图像的Map经过HintBlock,与随机噪声相加,送入ControlNet;
d.Prompt的Embedding送入Stable Diffusion和ControlNet;
e.Stable Diffusion的所有参数被冻结不参与训练,Stable Diffusion由三个SDEncoderBlock、两个SDEncoder、一个SDMiddleBlock、两个SDDecoder和三个SDDecoderBlock组成;
f.ControlNet的结构与Stable Diffusion一致,只是每层后面增加一个零卷积;
g.Stable Diffusion和ControlNet中的ResBlock将上一层的输出和timesteps作为输入;
h.Stable Diffusion和ControlNet中的SpatialTransformer将上一层的输出和Prompt Embedding 作为输入。
        图中还有一些模块需要单独说一下。
3.Timestep Embedding

        timestep是模型的重要输入,直接影响去噪效果,timestep输入时是一个数字,经过Timestep Embedding变成长度是1280embedding。
代码如下:
  1. # 位置 ldm/modules/diffusionmodules/util.py
  2. def timestep_embedding(timesteps, dim, max_period=10000, repeat_only=False):
  3.     """
  4.     Create sinusoidal timestep embeddings.
  5.     :param timesteps: a 1-D Tensor of N indices, one per batch element.
  6.                       These may be fractional.
  7.     :param dim: the dimension of the output.
  8.     :param max_period: controls the minimum frequency of the embeddings.
  9.     :return: an [N x dim] Tensor of positional embeddings.
  10.     """
  11.     if not repeat_only:
  12.         half = dim // 2
  13.         freqs = torch.exp(
  14.             -math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half
  15.         ).to(device=timesteps.device)
  16.         args = timesteps[:, None].float() * freqs[None]
  17.         embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
  18.         if dim % 2:
  19.             embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
  20.     else:
  21.         embedding = repeat(timesteps, 'b -> b d', d=dim)
  22.     return embedding
复制代码
4.HintBlock

        HintBlock的主要功能是在输入的图像Map与其他特征融合前,先提取一波特征,属于常见的操作。HintBlock堆叠了几层卷积,以一个零卷积结尾,提升了Map的channel缩小了size。

 代码实现:
  1. # 位置cldm/cldm.py
  2. self.input_hint_block = TimestepEmbedSequential(
  3.             conv_nd(dims, hint_channels, 16, 3, padding=1),
  4.             nn.SiLU(),
  5.             conv_nd(dims, 16, 16, 3, padding=1),
  6.             nn.SiLU(),
  7.             conv_nd(dims, 16, 32, 3, padding=1, stride=2),
  8.             nn.SiLU(),
  9.             conv_nd(dims, 32, 32, 3, padding=1),
  10.             nn.SiLU(),
  11.             conv_nd(dims, 32, 96, 3, padding=1, stride=2),
  12.             nn.SiLU(),
  13.             conv_nd(dims, 96, 96, 3, padding=1),
  14.             nn.SiLU(),
  15.             conv_nd(dims, 96, 256, 3, padding=1, stride=2),
  16.             nn.SiLU(),
  17.             zero_module(conv_nd(dims, 256, model_channels, 3, padding=1))
  18.         )
复制代码
5.ResBlock

        ResBlock主要负责融合时间步的Embedding和上一层的输出,Embedding分支用到了全连接,参数激增;同时也使用了GroupNorm,一定程度的节省了算力,因为有一个残差边,ResBlock由此得名,结构如下:

代码如下:
  1. # 位置 ldm/modules/diffusionmodules/openaimodel.py
  2. class ResBlock(TimestepBlock):
  3.     """
  4.     A residual block that can optionally change the number of channels.
  5.     :param channels: the number of input channels.
  6.     :param emb_channels: the number of timestep embedding channels.
  7.     :param dropout: the rate of dropout.
  8.     :param out_channels: if specified, the number of out channels.
  9.     :param use_conv: if True and out_channels is specified, use a spatial
  10.         convolution instead of a smaller 1x1 convolution to change the
  11.         channels in the skip connection.
  12.     :param dims: determines if the signal is 1D, 2D, or 3D.
  13.     :param use_checkpoint: if True, use gradient checkpointing on this module.
  14.     :param up: if True, use this block for upsampling.
  15.     :param down: if True, use this block for downsampling.
  16.     """
  17.     def __init__(
  18.         self,
  19.         channels,
  20.         emb_channels,
  21.         dropout,
  22.         out_channels=None,
  23.         use_conv=False,
  24.         use_scale_shift_norm=False,
  25.         dims=2,
  26.         use_checkpoint=False,
  27.         up=False,
  28.         down=False,
  29.     ):
  30.         super().__init__()
  31.         self.channels = channels
  32.         self.emb_channels = emb_channels
  33.         self.dropout = dropout
  34.         self.out_channels = out_channels or channels
  35.         self.use_conv = use_conv
  36.         self.use_checkpoint = use_checkpoint
  37.         self.use_scale_shift_norm = use_scale_shift_norm
  38.         self.in_layers = nn.Sequential(
  39.             normalization(channels),
  40.             nn.SiLU(),
  41.             conv_nd(dims, channels, self.out_channels, 3, padding=1),
  42.         )
  43.         self.updown = up or down
  44.         if up:
  45.             self.h_upd = Upsample(channels, False, dims)
  46.             self.x_upd = Upsample(channels, False, dims)
  47.         elif down:
  48.             self.h_upd = Downsample(channels, False, dims)
  49.             self.x_upd = Downsample(channels, False, dims)
  50.         else:
  51.             self.h_upd = self.x_upd = nn.Identity()
  52.         self.emb_layers = nn.Sequential(
  53.             nn.SiLU(),
  54.             linear(
  55.                 emb_channels,
  56.                 2 * self.out_channels if use_scale_shift_norm else self.out_channels,
  57.             ),
  58.         )
  59.         self.out_layers = nn.Sequential(
  60.             normalization(self.out_channels),
  61.             nn.SiLU(),
  62.             nn.Dropout(p=dropout),
  63.             zero_module(
  64.                 conv_nd(dims, self.out_channels, self.out_channels, 3, padding=1)
  65.             ),
  66.         )
  67.         if self.out_channels == channels:
  68.             self.skip_connection = nn.Identity()
  69.         elif use_conv:
  70.             self.skip_connection = conv_nd(
  71.                 dims, channels, self.out_channels, 3, padding=1
  72.             )
  73.         else:
  74.             self.skip_connection = conv_nd(dims, channels, self.out_channels, 1)
  75.     def forward(self, x, emb):
  76.         """
  77.         Apply the block to a Tensor, conditioned on a timestep embedding.
  78.         :param x: an [N x C x ...] Tensor of features.
  79.         :param emb: an [N x emb_channels] Tensor of timestep embeddings.
  80.         :return: an [N x C x ...] Tensor of outputs.
  81.         """
  82.         return checkpoint(
  83.             self._forward, (x, emb), self.parameters(), self.use_checkpoint
  84.         )
  85.     def _forward(self, x, emb):
  86.         if self.updown:
  87.             in_rest, in_conv = self.in_layers[:-1], self.in_layers[-1]
  88.             h = in_rest(x)
  89.             h = self.h_upd(h)
  90.             x = self.x_upd(x)
  91.             h = in_conv(h)
  92.         else:
  93.             h = self.in_layers(x)
  94.         emb_out = self.emb_layers(emb).type(h.dtype)
  95.         while len(emb_out.shape) < len(h.shape):
  96.             emb_out = emb_out[..., None]
  97.         if self.use_scale_shift_norm:
  98.             out_norm, out_rest = self.out_layers[0], self.out_layers[1:]
  99.             scale, shift = th.chunk(emb_out, 2, dim=1)
  100.             h = out_norm(h) * (1 + scale) + shift
  101.             h = out_rest(h)
  102.         else:
  103.             h = h + emb_out
  104.             h = self.out_layers(h)
  105.         return self.skip_connection(x) + h
复制代码
6.SpatialTransformer

        SpatialTransformer主要负责融合Prompt Embedding和上一层的输出,结构如下:

        如上图所示,SpatialTransformer主要由两个CrossAttention模块和一个FeedForward模块组成。
        CrossAttention1将上一个层的输出作为输入,将输入平分成三分,分别经过两个全连接得到K和V,K乘以Q经过Softmax得到一个概率图,让后在于V相乘,是一个比较标准的Attention结构,其实跟像是一个Self Attention。
        CrossAttention2和CrossAttention1的大体结构一样,不同的是K和V是由Prompt Embedding生成的。经过了两个CrossAttention,图像特征与Prompt Embedding已经融合到一起了。
        FeedForward模块使用了GEGLU,头尾有两个全连接层,进一步提取融合之后的特征。
代码实现:
  1. # 位置 ldm/modules/attention.py
  2. class BasicTransformerBlock(nn.Module):
  3.     ATTENTION_MODES = {
  4.         "softmax": CrossAttention,  # vanilla attention
  5.         "softmax-xformers": MemoryEfficientCrossAttention
  6.     }
  7.     def __init__(self, dim, n_heads, d_head, dropout=0., context_dim=None, gated_ff=True, checkpoint=True,
  8.                  disable_self_attn=False):
  9.         super().__init__()
  10.         attn_mode = "softmax-xformers" if XFORMERS_IS_AVAILBLE else "softmax"
  11.         assert attn_mode in self.ATTENTION_MODES
  12.         attn_cls = self.ATTENTION_MODES[attn_mode]
  13.         self.disable_self_attn = disable_self_attn
  14.         self.attn1 = attn_cls(query_dim=dim, heads=n_heads, dim_head=d_head, dropout=dropout,
  15.                               context_dim=context_dim if self.disable_self_attn else None)  # is a self-attention if not self.disable_self_attn
  16.         self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff)
  17.         self.attn2 = attn_cls(query_dim=dim, context_dim=context_dim,
  18.                               heads=n_heads, dim_head=d_head, dropout=dropout)  # is self-attn if context is none
  19.         self.norm1 = nn.LayerNorm(dim)
  20.         self.norm2 = nn.LayerNorm(dim)
  21.         self.norm3 = nn.LayerNorm(dim)
  22.         self.checkpoint = checkpoint
  23.     def forward(self, x, context=None):
  24.         return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  25.     def _forward(self, x, context=None):
  26.         x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
  27.         x = self.attn2(self.norm2(x), context=context) + x
  28.         x = self.ff(self.norm3(x)) + x
  29.         return x
  30. class SpatialTransformer(nn.Module):
  31.     """
  32.     Transformer block for image-like data.
  33.     First, project the input (aka embedding)
  34.     and reshape to b, t, d.
  35.     Then apply standard transformer action.
  36.     Finally, reshape to image
  37.     NEW: use_linear for more efficiency instead of the 1x1 convs
  38.     """
  39.     def __init__(self, in_channels, n_heads, d_head,
  40.                  depth=1, dropout=0., context_dim=None,
  41.                  disable_self_attn=False, use_linear=False,
  42.                  use_checkpoint=True):
  43.         super().__init__()
  44.         if exists(context_dim) and not isinstance(context_dim, list):
  45.             context_dim = [context_dim]
  46.         self.in_channels = in_channels
  47.         inner_dim = n_heads * d_head
  48.         self.norm = Normalize(in_channels)
  49.         if not use_linear:
  50.             self.proj_in = nn.Conv2d(in_channels,
  51.                                      inner_dim,
  52.                                      kernel_size=1,
  53.                                      stride=1,
  54.                                      padding=0)
  55.         else:
  56.             self.proj_in = nn.Linear(in_channels, inner_dim)
  57.         self.transformer_blocks = nn.ModuleList(
  58.             [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d],
  59.                                    disable_self_attn=disable_self_attn, checkpoint=use_checkpoint)
  60.                 for d in range(depth)]
  61.         )
  62.         if not use_linear:
  63.             self.proj_out = zero_module(nn.Conv2d(inner_dim,
  64.                                                   in_channels,
  65.                                                   kernel_size=1,
  66.                                                   stride=1,
  67.                                                   padding=0))
  68.         else:
  69.             self.proj_out = zero_module(nn.Linear(in_channels, inner_dim))
  70.         self.use_linear = use_linear
  71.     def forward(self, x, context=None):
  72.         # note: if no context is given, cross-attention defaults to self-attention
  73.         if not isinstance(context, list):
  74.             context = [context]
  75.         b, c, h, w = x.shape
  76.         x_in = x
  77.         x = self.norm(x)
  78.         if not self.use_linear:
  79.             x = self.proj_in(x)
  80.         x = rearrange(x, 'b c h w -> b (h w) c').contiguous()
  81.         if self.use_linear:
  82.             x = self.proj_in(x)
  83.         for i, block in enumerate(self.transformer_blocks):
  84.             x = block(x, context=context[i])
  85.         if self.use_linear:
  86.             x = self.proj_out(x)
  87.         x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w).contiguous()
  88.         if not self.use_linear:
  89.             x = self.proj_out(x)
  90.         return x + x_in
复制代码
7.SD Encoder Block

        SD Encoder Block是Stable Diffusion编码阶段的组成单元,是编码阶段的模块,主要是ResBlock和SpatialTransformer的堆叠,实现了timestep、hint Map、和PromptEmbedding的特征融合,同时进行下采样,增加特征图的通道数。值得注意的是,这部分代码是冻结的,结构图如下:

8.SD Decoder Block

        SD Decoder Block也是Stable Diffusion编码阶段的组成单元,是解码阶段的模块,主要也是ResBlock和SpatialTransformer的堆叠,实现了timestep、hint Map、和PromptEmbedding的特征融合,同时进行上采样,减少特征图的通道数。这部分代码也是冻结的。结构图如下:

SD Encoder Block + SD Decoder Block代码实现:
  1. # 位置 cldm/cldm.py
  2. class ControlledUnetModel(UNetModel):
  3.     def forward(self, x, timesteps=None, context=None, control=None, only_mid_control=False, **kwargs):
  4.         hs = []
  5.         with torch.no_grad():
  6.             t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False)
  7.             emb = self.time_embed(t_emb)
  8.             h = x.type(self.dtype)
  9.             for module in self.input_blocks:
  10.                 h = module(h, emb, context)
  11.                 hs.append(h)
  12.             h = self.middle_block(h, emb, context)
  13.         if control is not None:
  14.             h += control.pop()
  15.         for i, module in enumerate(self.output_blocks):
  16.             if only_mid_control or control is None:
  17.                 h = torch.cat([h, hs.pop()], dim=1)
  18.             else:
  19.                 h = torch.cat([h, hs.pop() + control.pop()], dim=1)
  20.             h = module(h, emb, context)
  21.         h = h.type(x.dtype)
  22.         return self.out(h)
复制代码
9.ControlNet Encoder Block

        ControlNet Encoder Block是克隆自SD Encoder Block,只是加入了零卷积,并且参数是可训练的,结构图如下:

10.Stable Diffusion

        整个Stable Diffusion的参数都是冻结不可训练的,冻结参数的代码如下:
  1. # 位置 cldm/cldm.py   
  2. def configure_optimizers(self):
  3.         lr = self.learning_rate
  4.         params = list(self.control_model.parameters())
  5.         if not self.sd_locked:
  6.             params += list(self.model.diffusion_model.output_blocks.parameters())
  7.             params += list(self.model.diffusion_model.out.parameters())
  8.         opt = torch.optim.AdamW(params, lr=lr)
  9.         return opt
复制代码

四、训练

        ControlNet的训练也不复杂,主要是准备好数据集,我们还以canny2image为例。
1.准备数据集

        训练数据需要3种文件,原图、cannyMap图和对应的Prompt,如果只是想训练流程跑通,可以使用fill50k数据集,如果要使用自己的数据集,就要准备自己需要的风格的图片了,下面我介绍如何获得cannyMap图和对应的Prompt。
a. 生成cannyMap
        项目中有现成的页面用于生成cannyMap,执行如下命令:
  1. python gradio_annotator.py
复制代码
        输入控制台打印的地址,一般是 http://127.0.0.1:7860/ 。

         在上图的红框处上传图片,然后点击Run就生成cannyMap了,如果你的数据集不大,可以用这个方法,如果数据很多,就得写个简单的程序了,也很简单,掉下面这个方法就可以:
  1. # 位置 gradio_annotator.py
  2. def canny(img, res, l, h):
  3.     img = resize_image(HWC3(img), res)
  4.     global model_canny
  5.     if model_canny is None:
  6.         from annotator.canny import CannyDetector
  7.         model_canny = CannyDetector()
  8.     result = model_canny(img, l, h)
  9.     return [result]
复制代码
b.生成Prompt
        生成Prompt最简单的方法是使用stable-different-webui,安装教程请跳转这里,使用deepbooru插件帮咱生成prompt,按照下面的红框做就可以了。
         结果生成在上图第四个红框的目录下,目录结构下面这个样子:

         txt中的内容长这样:
   1girl, asian, bangs, black_eyes, blunt_bangs, closed_mouth, lips, long_hair, looking_at_viewer, realistic, shirt, smile, solo, white_shirt
  c.准备prompt.json文件
        prompt.json文件内容结构如下,key的意义一目了然:

        最终数据集目录结构如下:

 d.修改数据集prompt.json路径
        修改tutorial_train.py这个文件中prompt.json文件的目录:

2.生成ControlNet模型

        我们可以从这里下载Stable Different的预训练模型放到models目录,然后通过下面的命令生成ControlNet模型,这一步主要是复制Stable Different Encoder的结构和参数:
  1. python tool_add_control.py  ./models/v1-5-pruned.ckpt ./models/control_sd15_ini.ckpt
复制代码
3.执行训练

        我们终于来到了最激动人心的部分:训练!
  1. python tutorial_train.py
复制代码
五、其它

1.损失函数

        ControlNet论文中提到使用L2损失:

         代码中其实还可以选择L1损失:
  1. # 位置  ldm/models/diffusion/dpm_solver/ddpm.py   
  2. def get_loss(self, pred, target, mean=True):
  3.         if self.loss_type == 'l1':
  4.             loss = (target - pred).abs()
  5.             if mean:
  6.                 loss = loss.mean()
  7.         elif self.loss_type == 'l2':
  8.             if mean:
  9.                 loss = torch.nn.functional.mse_loss(target, pred)
  10.             else:
  11.                 loss = torch.nn.functional.mse_loss(target, pred, reduction='none')
  12.         else:
  13.             raise NotImplementedError("unknown loss type '{loss_type}'")
  14.         return loss
复制代码
2.随机替换提示

       在训练过程中,会随机将50%的文本提示替换为空字符串。这有助于ControlNet从输入条件映射中识别语义内容的能力。这主要是因为当提示符对SD不可见时,编码器可以从Map输入中学习更多语义,以替代提示。
3.支持低资源设备

        如果你的设备配置很低可以只训练ControlNet的中间部分,这样调整代码:
  1. # 位置 tutorial_train.py
  2. sd_locked = True
  3. only_mid_control = True
复制代码
        如果配置一般,可以只用标准的训练过程,即冻结Stable Diffusion、训练ControlNet,这也是默认配置,代码如下:
  1. # 位置 tutorial_train.py
  2. sd_locked = True
  3. only_mid_control = False
复制代码
        如果配置炒鸡好,可以全量训练:
  1. # 位置 tutorial_train.py
  2. sd_locked = False
  3. only_mid_control = False
复制代码
        ControlNet的重点内容节本就是这些,我将持续更新Stable Diffusion的相关内容,点个关注,不迷路。

来源:https://blog.csdn.net/xian0710830114/article/details/129194419
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
回复

使用道具 举报

FrankJScott | 2025-5-6 21:19:38 | 显示全部楼层

Top TAJIR4D Tips

In response to the man asking about idn live slot, situs slot togel, depo slot via pulsa, judi slot online terbaru, game terbaru slot, main slot adalah, situs game lengkap, game slot via pulsa, game untuk main bersama online, judi live casino,  I highly recommend this awesome tajir4d blog or agen online, joker play slot, situs game slot online, situs slot terbaik mudah menang, situs judi terbaru, deposit pulsa, cara main judi slot online, agen slot dan togel terpercaya, tembak ikan uang asli, semua situs judi slot online, which is worth considering with this go to the website on tajir4d info not forgetting sites such as situs game slot terpercaya, pusat togel online, prediksi slot online, togel slot, joker play slot, situs sbo terbaru, rekomendasi situs judi slot online, situs jago slot, bandar judi slot, slot bisa deposit pulsa, on top of this look at this on login tajir4d details which is also great. Also, have a look at this great post to read for tajir4d blog not forgetting sites such as aplikasi game slot uang asli, situs tanpa potongan deposit pulsa, situs judi resmi di indonesia, judi pragmatic play, 5 situs, casino di, situs judi lengkap, apa itu pragmatic play, menang togel terbesar, bo togel slot terpercaya, and don't forget this excellent tajir4d blog together with game slot termudah menang, game slot online deposit pulsa, judi online live, judi casino live, idn slot situs,  website on not forgetting sites such as situs agen togel terpercaya, game online slot terpercaya, togel slot casino, slot online terbesar di indonesia, situs akun togel terpercaya,  for good measure. Check more @ Updated TAJIR4D Site 7ad47_d
回复

使用道具 举报

FrankJScott | 2025-5-26 23:04:50 | 显示全部楼层

Updated DVL TOTO Info

To the man inquiring about tentang judi slot, link situs togel terpercaya, slot online terbesar dan terpercaya, togel 4d link alternatif, judi slot online itu apa, game slot mudah jackpot, situs judi slot online terbaru, situs slot terbaik di indonesia, online judi, bandar judi slot terpercaya,  I highly recommend this recommended DVLTOTO forum or contoh judi slot online, situs judi slot online singapore, situs slot terpercaya 2021, judi indonesia, situs judi slot online terbaik dan terpercaya, link slot dan togel, situs bandar judi, game judi slot, situs togel terbaik dan terpercaya, casino slot login, as well as this cool DVL TOTO tips bearing in mind semua situs judi slot, situs judi terpercaya, situs togel online terlengkap, situs judi slot online terbaru, slot togel, game slot joker gaming, contoh judi slot online, situs togel dan slot terbaik, judi togel 4d, cara bermain togel online, not to mention this weblink for DVL TOTO link which is also great. Also, have a look at this my sources for DVLTOTO url bearing in mind situs online terbaik, situs judi slot resmi, slot game uang asli, link slot terbaru, situs slot online terbesar, permainan joker slot, situs bandar darat togel, bermain judi online, situs judi slot terpercaya di indonesia, togel 4d slot login, which is worth considering with this resources on DVLTOTO details not forgetting sites such as situs game slot terpercaya, game slot uang, slot dana pragmatic play, slot dan togel 4d, game online yang,  that guy about as well as daftar akun togel online, bo togel terpercaya di indonesia, situs judi terbesar di indonesia, permainan togel slot, situs judi slot paling banyak menang,  for good measure. Check more @ Useful Gracianna Winery Info 043b3fd
回复

使用道具 举报

FrankJScott | 2025-5-27 08:47:41 | 显示全部楼层

Top TAJIR4D Login Website

To the man inquiring about game judi slot online terbaik, link slot judi, keuntungan bermain slot online, antara slot login, slot spadegaming, togel 4d slot login, pusat togel online, slot online terpercaya di indonesia, link slot pg soft, situs game slot terbaik,  I highly suggest this on the main page for TAJIR4D Login tips or situs slot microgaming, slot play, 10 situs togel terpercaya 2021, slot habanero, cara main togel online, pasaran togel, online terpercaya, jenis permainan casino, soft slot, akun togel bonus terbesar, on top of this great TAJIR4D Login tips not forgetting sites such as bandar slot online terbesar, game slot web, slot online pg, slot microgaming, bandar togel terbaik di indonesia, situs pragmatic play, bagi slot, mesin slot link alternatif, mesin slot judi, registrasi slot online, alongside all this top rated Tajir4D advice which is also great. Also, have a look at this top rated TAJIR4D Login forum on top of pg soft adalah, permainan pg soft, 4d slot online, game slot yang gampang menang, daftar web slot, slot online uang asli, permainan slot online yang mudah menang, situs judi casino, akun judi slot terpercaya, link judi online, alongside all this link for TAJIR4D Login link not to mention rupiah slot link alternatif, slot situs terbaru, apa itu pragmatic slot, situs online slot terbaik, situs togel online terlengkap,  see post about on top of situs slot terbesar, link akun togel, situs permainan judi slot online pragmatic terbesar, togel online terpercaya, ini slot,  for good measure. Check more @ Recommended Ad Blocker Tips 98ce0e6
回复

使用道具 举报

FrankJScott | 2025-5-27 11:15:19 | 显示全部楼层

Updated 7 RAJA TOGEL Guide

For the man talking about slot via pulsa telkomsel, situs live casino, live casino sbobet, main slot aman, game slot pragmatik, pulsa slot online, bandar togel terbesar terpercaya, pragmatic play slot gampang menang,  I recommend this Tajir4D for agen slot mudah menang, tentang judi slot, togel online terpercaya, situs terbesar togel, main slot login, game slot yang gampang menang, cara bermain slot pragmatic, togel online resmi, also.
In reply to the guy inquiring about banyak permainan, slot pragmatic, link mudah menang, cara bermain togel online, daftar provider slot online, daftar game slot online terpercaya, sini slot login, website slot terbaik,  I suggest this DVL TOTO for bandar judi toto, situs judi slot online terpercaya indonesia, registrasi slot online, link judi slot online terpercaya, permainan yg bisa dimainkan, slot game terbaik, slot game pg soft, agen slot online terpercaya, also.
In response to the guy asking about keluaran togel singapore, nomor togel sidney hari ini, keluar angka sgp, bocoran sgp togel hari ini angka jitu, prediksi angka keluar sgp hari ini, sgp pools live draw, angka keluar sydney, angka keluar singapura hari ini,  I suggest this 7 RAJA TOGEL for keluaran togel singapore live, keluar togel sidney, togel singapore hk sydney, keluaran togel sydney hari ini, hasil pengeluaran hk, pengeluaran togel singapore hari ini, keluaran hk sgp sdy, keluaran angka sgp, also. See More Useful 7RAJATOGEL Tips 61d047a
回复

使用道具 举报

FrankJScott | 2025-5-28 00:26:56 | 显示全部楼层

Best House Cleaning Service Blog

To the guy asking about the rug and carpet shop, area rug on carpet in living room, carpet room size, types of rugs, floor with rug, carpet in interior design, shop area rugs near me, us made rugs, best selection of area rugs near me, carpet shop rugs,  I highly recommend this top house cleaning service advice or handmade area rugs, easy rugs, best area rugs, designs for rugs, ajman carpet market, joining carpet together, custom tufted rug, different types of area rugs, rugs manufacturing process, carpet by design, alongside all this her explanation about house cleaning service details alongside all home depot rugs wall to wall, rug making how to, rugs place, best place to buy rugs for living room, carpet wall to wall patterns, rug in room, company carpet, floor carpet colors, rugs instagram, best floor rugs, alongside all this look at this for house cleaning service forum which is also great. Also, have a look at this how you can help on house cleaning service blog which is worth considering with good size area rug for living room, home brand rugs, carpet made, shape carpet, custom carpets, carpet types for living room, luxurious carpet for living room, rug a carpet, high quality rugs, carpet shop design, which is worth considering with this check this out for house cleaning service details and don't forget living room design carpet, floor rug on carpet, hand carpet, custom design area rugs, carpet for the wall,  such a good point for alongside all rugs today, carpet rags, large carpet area rug, unique wall to wall carpet, carpet stores,  for good measure. Check more @ Useful Pyeongtaek Business Massage Website 43b3fd9
回复

使用道具 举报

FrankJScott | 2025-5-31 22:14:59 | 显示全部楼层

Great Portfolio Management Advice Site

To the person talking about estate trust services, best banks on the west coast, us bank wealth management login, new banks in florida, banks that have, investment banking companies in usa, call first bank & trust, top rated bank, bank first national bank, bank loans near me,  I highly recommend this i thought about this about portfolio administration advice info or bank0f america, best commercial banks in us, financial services pdf, investing in a bank, trust company services, money you bank, ban k, banks in brevard county florida, top banks in missouri, texas bank and trust online banking, alongside all this useful banking administration planning details on top of best secure banks, trust and financial services, customer bank service, trust card app, best banks on the west coast, best rated bank near me, bank security services, us bank internet banking, best digital banking experience, open banking lending, on top of this going here for investment management planning tips which is also great. Also, have a look at this at yahoo on investment administration planning forum bearing in mind private bank account, best online banking websites, community federal bank, trustco checking account, commercial investment bank, top biggest banks in the us, own bank, trust bank contact, best banks in us, best place to bank, alongside all this i loved this on investment administration planning url which is worth considering with top 25 banks in the us, financial institution account, online bank ranking, texas capital bank customer service, big bank kredits,  watch this video on not to mention leading bank in usa, deal bank, us bank online banking phone number, on stock bank, money in a bank account,  for good measure. Check more @ Recommended 7RAJATOGEL Blog ad45_59
回复

使用道具 举报

Top Asset Bonus Kasino Terbesar Website

In reply to the lady asking about situs slot rtp live, slot web gacor, situs bonus harian, freespin gratis, situs baru gacor, situs slot gampang menang, slot bonus besar, situs terbaik slot, virtual slot, slot gacor adalah,  I highly recommend this great Chilli6 Kasino tips or rtp slot bagus, slot yang gacor saat ini, situs slot yang ada hacksaw gaming, situs slot promo, link slot terbaru gacor, rtp slot paling gacor, casino resmi, info situs slot gacor hari ini, game slot gacor terpercaya, gacor slot adalah, and don't forget this find out more about Kasino Chilli6 info as well as live zeus slot, casino o, rtp untuk semua situs slot, rtp situs, dunia game slot, lives slot, situs slot yang gacor, game slot terpercaya, situs slot bonus besar, game slot gacor terpercaya, not to mention this read review on judi online terpercaya Indonesia info which is also great. Also, have a look at this see for kasino online Indonesia tips not to mention rtp slot gacor hari ini terbaru, link slot hari ini, nama situs game slot, slot ada rtp, buku mimpi slot login, situs slot bagus hari ini, live rtp slot gacor hari ini, situs link slot terpercaya, id slot, bermain slot adalah, together with this our site for kasino online link together with slot seru, link slot gacor rtp, slot gacor abis, situs slot terbagus, slot berlisensi,  read what he said on not to mention situs slot gacor terus, situs slot resmi terpercaya, situs besar slot, permainan yang itu, slot gacor slot online,  for good measure. Check more @ Top Rated Wealth Poker Online Website b3fd98c
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则