UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation

Hengjia Li1, Yang Liu1, Yuqi Lin1, Zhanwei Zhang1, Yibo Zhao1, Weihang Pan1, Tu Zheng2, Zheng Yang2, Chunjiang Yu3, Boxi Wu1, Deng Cai1.
1Zhejiang University, 2Fabu Inc, 3Ningbo Port.


Smile


Blue eyes + Big eyes


Angry + Anime + Green Big eyes

Abstract



Generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they are prone to overfitting domain-specific attributes, which inevitably compromises cross-domain consistency. In this paper, we propose UniHDA, a unified and versatile framework for generative hybrid domain adaptation with multi-modal references from multiple domains. We use CLIP encoder to project multi-modal references into a unified embedding space and then linear interpolate the direction vectors from multiple target domains to achieve hybrid domain adaptation. To ensure the cross-domain consistency, we propose a novel cross-domain spatial structure (CSS) loss that maintains detailed spatial structure information between source and target generator. Experiments show that the adapted generator can synthesise realistic images with various attribute compositions. Additionally, our framework is versatile to multiple generators, e.g., StyleGAN2 and Diffusion Models.



Method





Results on multi-modal hybrid domain







Results on DiffusionCLIP



Results on EG3D



BibTeX

@article{li2024unihda,
      title={UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators},
      author={Li, Hengjia and Liu, Yang and Lin, Yuqi and Zhang, Zhanwei and Zhao, Yibo and Zheng, Tu and Yang, Zheng and Jiang, Yuchun and Wu, Boxi and Cai, Deng and others},
      journal={arXiv preprint arXiv:2401.12596},
      year={2024}
    }