Experimental Details on GeFL

Table II


FL parameters
Num of users |C| 10
Num of heterogeneous models M 10
Data fraction 0.5

Generative mdoel (GM) parameters
Optimizer Adam Adam Adam Adam
Learning rate β 2e-4 1e-3 1e-4 1e-4
Weight decay None 1e-3 None None
b1, b2 (Adam) 0.5, 0.999 None None None
Batch size B 128 128 128 128
Communication rounds for training generative modelTKA 200 (100/100) 200 200 200
Local epochs for training generative modelTg 5 5 5 5
Image size 3 x 32 x 32 3 x 32 x 32 3 x 32 x 32 3 x 32 x 32
Latent dim (dg, dd) 256/64 (for G/D)
Latent size 50
n_feat (of Unet in DDPM) 128 128
Time step 400 400
  • DCGAN was trained for 100 rounds and additionally updated for 100 rounds during the training of target networks.

Target network (CNN) parameters
Target networks
Optimizer SGD
Learning rate α 0.1
Communication rounds for training target networkTTN 100
Local epochs for training training target network by real samples Tr 5
Local epochs for training training target network by synthetic samples Ts 1
Batch size B 128


  • FedProx: 1e-2 multiplied to proximal term
  • AvgKD: pseudo labels are aggregated from the outputs of 10 heterogeneous models.
  • FedDF: CIFAR100 as public dataset
  • LG-FedAvg: the first conv layer was employed as averaging over all the heterogeneous models while the other layers are averaged across submodels.


FL parameters
Num of users |C| 10
Num of heterogeneous models M 10
Data fraction 0.1

Generative mdoel (GM) parameters
Optimizer Adam Adam Adam Adam
Learning rate β 2e-4 1e-3 1e-4 1e-4
Weight decay None 1e-3 None None
b1, b2 (Adam) 0.5, 0.999 None None None
Batch size B 64 64 64 64
Communication rounds for training generative modelTKA 100 (50/50) 100 100 100
Local epochs for training generative modelTg 5 5 5 5
Image size 1 x 32 x 32 1 x 32 x 32 1 x 32 x 32 1 x 32 x 32
Latent dim (dg, dd) 128/128 (for G/D)
Latent size 16
n_feat (of Unet in DDPM) 128 128
Time step 100 100
  • DCGAN was trained for 50 rounds and additionally updated for 50 rounds during the training of target networks.

Target network (CNN) parameters
Target networks
Optimizer SGD
Learning rate α 0.1
Communication rounds for training target networkTTN 100
Local epochs for training training target network by real samples Tr 5
Local epochs for training target network by synthetic samples Ts 1
Batch size B 64


  • FedProx: 1e-2 multiplied to proximal term
  • AvgKD: Pseudo labels are aggregated from the outputs of 10 heterogeneous models.
  • FedDF: SVHN as public dataset for MNIST & CIFAR10 as public dataset for FMNIST
  • LG-FedAvg: the first conv layer was employed as averaging over all the heterogeneous models while the other layers are averaged across submodels.

Table III

FID/IS parameter
# of samples used 1000
MND parameter
# of validation samples 600
# of synthetic samples 600
# of training samples (for averaging) 1000
Distance measure LPIPS

Table IV

  • Generative model: DCGAN
  • All the parameters settings are identical to Table 1

Note: DA techniques are used only during target network training (not in training generative models)

Figure 4


FL parameters
Num of users |C| 10
Num of heterogeneous models M 10
Data fraction 0.5

Feature-generative mdoel (GM) parameters
Optimizer Adam Adam Adam Adam
Learning rate β 2e-4 1e-3 1e-4 1e-4
Weight decay None 1e-3 None None
b1, b2 (Adam) 0.5, 0.999 None None None
Batch size B 128 128 128 128
Communication rounds for training generative modelTKA 200 (100/100) 200 200 200
Local epochs for training generative modelTg 5 5 5 5
Image size 10 x 16 x 16 10 x 16 x 16 10 x 16 x 16 10 x 16 x 16
Latent dim (dg, dd) 256/64 (for G/D)
Latent size 50
n_feat (of Unet in DDPM) 128 128
Time step 500 500
  • DCGAN was trained for 100 rounds and additionally updated for 100 rounds during the training of target networks.

Target network (CNN) parameters
Target networks
Optimizer SGD
Learning rate α 0.1
Communication rounds for warming up feature extractorTFE 50
Local epochs for training feature extractor Tw 5
Communication rounds for training target networkTTN 100
Local epochs for training target network by real samples Tr 5
Local epochs for training target network by synthetic samples Ts 1
Batch size B 128


FL parameters
Num of users |C| 10
Num of heterogeneous models M 10
Data fraction 0.1

Feature-generative mdoel (GM) parameters
Optimizer Adam Adam Adam Adam
Learning rate β 2e-4 1e-3 1e-4 1e-4
Weight decay None 1e-3 None None
b1, b2 (Adam) 0.5, 0.999 None None None
Batch size B 64 64 64 64
Communication rounds for training generative modelTKA 100 (50/50) 100 100 100
Local epochs for training generative modelTg 5 5 5 5
Image size 3 x 16 x 16 3 x 16 x 16 3 x 16 x 16 3 x 16 x 16
Latent dim (dg, dd) 128/128 (for G/D)
Latent size 16
n_feat (of Unet in DDPM) 128 128
Time step 100 100
  • DCGAN was trained for 50 rounds and additionally updated for 50 rounds during the training of target networks.

Target network (CNN) parameters (MNIST/FMNIST)
Target networks
Optimizer SGD
Learning rate α 0.1
Communication rounds for warming up feature extractorTFE 20
Local epochs for training feature extractor Tw 5
Communication rounds for training target networkTTN 50
Local epochs for training target network by real samples Tr 5
Local epochs for training target network by synthetic samples Ts 1
Batch size B 64


Target network (CNN) parameters (SVHN)
Target networks
Optimizer SGD
Learning rate α 0.1
Communication rounds for warming up feature extractorTFE 50
Local epochs for training feature extractor Tw 5
Communication rounds for training target networkTTN 100
Local epochs for training target network by real samples Tr 5
Local epochs for training target network by synthetic samples Ts 1
Batch size B 64

Table VII

  • Generative model: DCGAN
  • All the parameters settings are identical to Table 1

Figure 7

Total dimension Feat size HL Comm round of training common FE Comm round of training header
1024 1*32*32 0 0 70
768 3*16*16 1 20 50
640 10*8*8 2 20 50
320 20*4*4 3 20 50
160 40*2*2 4 50 20
80 80*1*1 5 60 10
0 0 6 70 0
  • Homogeneity level (HL) 0 denotes the heterogeneous models without common feature extractor (FE) that only headers of models need to be trained.
  • HL 6 denotes the homogeneous models where all the models consists of only common FE (i.e., common FE is the whole model).

Figure 8

  • Models are described as in details on Figure 7.
  • MND was evaluated as in Table 2.

Figure 10

  • Real data scale : 0.01, 0.05, 0.1, 0.5, 1
  • GeFL w/ Syn 5: Local epochs for training training target network by synthetic samples Ts=5
  • GeFL w/ Syn 1: Local epochs for training training target network by synthetic samples Ts=1
  • Other parameters are identical to Table 1

Table 8

  • Same as Table 7

Model architecture

In following tables, conv(c,k,p) denotes a 2d convolutional layer, where c is the output channel size, k is the kernel size, and p is the padding size. bn(c) represents a batch normalization layer with c denoting the channel size. The term relu denotes the rectified linear layer, and maxpool(k,s,p) denotes a max-pooling layer. where k is the kernel size, s is the stride, and p is the padding size. Finally, fc(in/out) indicates a fully connected layer, where in is the number of input nodes and out is the number of output nodes.


CNN-1 CNN-2 CNN-3 CNN-4 CNN-5 CNN-6 CNN-7 CNN-8 CNN-9 CNN-10
conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1)
bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3)
relu relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(16,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1)
relu relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1) conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1) conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1)
relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(64,3×3,1) conv(80,3×3,1) conv(40,3×3,1) conv(64,3×3,1) conv(80,3×3,1) conv(40,3×3,1)
relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(64,3×3,1) conv(80,3×3,1) conv(40,3×3,1)
relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)


CNN-1 CNN-2 CNN-3 CNN-4 CNN-5 CNN-6 CNN-7 CNN-8 CNN-9 CNN-10
conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1)
bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3)
relu relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(16,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1)
bn(16) bn(16) bn(20) bn(10) bn(16) bn(20) bn(10) bn(16) bn(20) bn(10)
relu relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1) conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1) conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1)
bn(32) bn(40) bn(20) bn(32) bn(40) bn(20) bn(32) bn(40) bn(20)
relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(64,3×3,1) conv(80,3×3,1) conv(40,3×3,1) conv(64,3×3,1) conv(80,3×3,1) conv(40,3×3,1)
bn(64) bn(80) bn(40) bn(64) bn(80) bn(40)
relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(128,3×3,1) conv(100,3×3,1) conv(80,3×3,1)
bn(128) bn(100) bn(80)
relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
fc(1024/10) fc(512/10) fc(640/10) fc(320/10) fc(256/10) fc(320/10) fc(160/10) fc(128/10) fc(100/10) fc(80/10)


CNN-1 CNN-2 CNN-3 CNN-4 CNN-5 CNN-6 CNN-7 CNN-8 CNN-9 CNN-10
conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1) conv(3,3×3,1)
bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3) bn(3)
relu relu relu relu relu relu relu relu relu relu
conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(10,3×3,1)
bn(10) bn(10) bn(10) bn(10) bn(10) bn(10) bn(10) bn(10) bn(10) bn(10)
relu relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(16,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1) conv(10,3×3,1) conv(20,3×3,1) conv(10,3×3,1) conv(16,3×3,1) conv(20,3×3,1) conv(10,3×3,1)
relu relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(32×3,1) conv(40,3×3,1) conv(20,3×3,1) conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1) conv(32,3×3,1) conv(40,3×3,1) conv(20,3×3,1)
relu relu relu relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(64,3×3,1) conv(80,3×3,1) conv(40,3×3,1) conv(64,3×3,1) conv(80,3×3,1) conv(40,3×3,1)
relu relu relu relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
conv(128,3×3,1) conv(100,3×3,1) conv(80,3×3,1)
relu relu relu
maxpool(2×2,2,0) maxpool(2×2,2,0) maxpool(2×2,2,0)
fc(1024/10) fc(512/10) fc(640/10) fc(320/10) fc(256/10) fc(320/10) fc(160/10) fc(128/10) fc(100/10) fc(80/10)