UniHDA

Abstract

Generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they are prone to overfitting domain-specific attributes, which inevitably compromises cross-domain consistency. In this paper, we propose UniHDA, a unified and versatile framework for generative hybrid domain adaptation with multi-modal references from multiple domains. We use CLIP encoder to project multi-modal references into a unified embedding space and then linear interpolate the direction vectors from multiple target domains to achieve hybrid domain adaptation. To ensure the cross-domain consistency, we propose a novel cross-domain spatial structure (CSS) loss that maintains detailed spatial structure information between source and target generator. Experiments show that the adapted generator can synthesise realistic images with various attribute compositions. Additionally, our framework is versatile to multiple generators, e.g., StyleGAN2 and Diffusion Models.

Method

Results on multi-modal hybrid domain

Results on DiffusionCLIP

Results on EG3D

BibTeX

@article{li2024unihda,
      title={UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators},
      author={Li, Hengjia and Liu, Yang and Lin, Yuqi and Zhang, Zhanwei and Zhao, Yibo and Zheng, Tu and Yang, Zheng and Jiang, Yuchun and Wu, Boxi and Cai, Deng and others},
      journal={arXiv preprint arXiv:2401.12596},
      year={2024}
    }

UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation

Smile

Blue eyes + Big eyes