Communication rounds for training generative modelTKA
200 (100/100)
200
200
200
Local epochs for training generative modelTg
5
5
5
5
Image size
3 x 32 x 32
3 x 32 x 32
3 x 32 x 32
3 x 32 x 32
Latent dim (dg, dd)
256/64 (for G/D)
Latent size
50
n_feat (of Unet in DDPM)
128
128
Time step
400
400
DCGAN was trained for 100 rounds and additionally updated for 100 rounds during the training of target networks.
Target network (CNN) parameters
Target networks
Optimizer
SGD
Learning rate α
0.1
Communication rounds for training target networkTTN
100
Local epochs for training training target network by real samples Tr
5
Local epochs for training training target network by synthetic samples Ts
1
Batch size B
128
Baselines
FedProx: 1e-2 multiplied to proximal term
AvgKD: pseudo labels are aggregated from the outputs of 10 heterogeneous models.
FedDF: CIFAR100 as public dataset
LG-FedAvg: the first conv layer was employed as averaging over all the heterogeneous models while the other layers are averaged across submodels.
MNIST/FMNIST
FL parameters
Num of users |C|
10
Num of heterogeneous models M
10
Data fraction
0.1
Generative mdoel (GM) parameters
DCGAN
CVAE
DDPM w=0
DDPM w=2
Optimizer
Adam
Adam
Adam
Adam
Learning rate β
2e-4
1e-3
1e-4
1e-4
Weight decay
None
1e-3
None
None
b1, b2 (Adam)
0.5, 0.999
None
None
None
Batch size B
64
64
64
64
Communication rounds for training generative modelTKA
100 (50/50)
100
100
100
Local epochs for training generative modelTg
5
5
5
5
Image size
1 x 32 x 32
1 x 32 x 32
1 x 32 x 32
1 x 32 x 32
Latent dim (dg, dd)
128/128 (for G/D)
Latent size
16
n_feat (of Unet in DDPM)
128
128
Time step
100
100
DCGAN was trained for 50 rounds and additionally updated for 50 rounds during the training of target networks.
Target network (CNN) parameters
Target networks
Optimizer
SGD
Learning rate α
0.1
Communication rounds for training target networkTTN
100
Local epochs for training training target network by real samples Tr
5
Local epochs for training target network by synthetic samples Ts
1
Batch size B
64
Baselines
FedProx: 1e-2 multiplied to proximal term
AvgKD: Pseudo labels are aggregated from the outputs of 10 heterogeneous models.
FedDF: SVHN as public dataset for MNIST & CIFAR10 as public dataset for FMNIST
LG-FedAvg: the first conv layer was employed as averaging over all the heterogeneous models while the other layers are averaged across submodels.
Table III
FID/IS parameter
# of samples used
1000
MND parameter
# of validation samples
600
# of synthetic samples
600
# of training samples (for averaging)
1000
Distance measure
LPIPS
Table IV
Generative model: DCGAN
All the parameters settings are identical to Table 1
Note: DA techniques are used only during target network training (not in training generative models)
Figure 4
CIFAR10
FL parameters
Num of users |C|
10
Num of heterogeneous models M
10
Data fraction
0.5
Feature-generative mdoel (GM) parameters
DCGAN
CVAE
DDPM w=0
DDPM w=2
Optimizer
Adam
Adam
Adam
Adam
Learning rate β
2e-4
1e-3
1e-4
1e-4
Weight decay
None
1e-3
None
None
b1, b2 (Adam)
0.5, 0.999
None
None
None
Batch size B
128
128
128
128
Communication rounds for training generative modelTKA
200 (100/100)
200
200
200
Local epochs for training generative modelTg
5
5
5
5
Image size
10 x 16 x 16
10 x 16 x 16
10 x 16 x 16
10 x 16 x 16
Latent dim (dg, dd)
256/64 (for G/D)
Latent size
50
n_feat (of Unet in DDPM)
128
128
Time step
500
500
DCGAN was trained for 100 rounds and additionally updated for 100 rounds during the training of target networks.
Target network (CNN) parameters
Target networks
Optimizer
SGD
Learning rate α
0.1
Communication rounds for warming up feature extractorTFE
50
Local epochs for training feature extractor Tw
5
Communication rounds for training target networkTTN
100
Local epochs for training target network by real samples Tr
5
Local epochs for training target network by synthetic samples Ts
1
Batch size B
128
MNIST/FMNIST/SVHN
FL parameters
Num of users |C|
10
Num of heterogeneous models M
10
Data fraction
0.1
Feature-generative mdoel (GM) parameters
DCGAN
CVAE
DDPM w=0
DDPM w=2
Optimizer
Adam
Adam
Adam
Adam
Learning rate β
2e-4
1e-3
1e-4
1e-4
Weight decay
None
1e-3
None
None
b1, b2 (Adam)
0.5, 0.999
None
None
None
Batch size B
64
64
64
64
Communication rounds for training generative modelTKA
100 (50/50)
100
100
100
Local epochs for training generative modelTg
5
5
5
5
Image size
3 x 16 x 16
3 x 16 x 16
3 x 16 x 16
3 x 16 x 16
Latent dim (dg, dd)
128/128 (for G/D)
Latent size
16
n_feat (of Unet in DDPM)
128
128
Time step
100
100
DCGAN was trained for 50 rounds and additionally updated for 50 rounds during the training of target networks.
Target network (CNN) parameters (MNIST/FMNIST)
Target networks
Optimizer
SGD
Learning rate α
0.1
Communication rounds for warming up feature extractorTFE
20
Local epochs for training feature extractor Tw
5
Communication rounds for training target networkTTN
50
Local epochs for training target network by real samples Tr
5
Local epochs for training target network by synthetic samples Ts
1
Batch size B
64
SVHN
Target network (CNN) parameters (SVHN)
Target networks
Optimizer
SGD
Learning rate α
0.1
Communication rounds for warming up feature extractorTFE
50
Local epochs for training feature extractor Tw
5
Communication rounds for training target networkTTN
100
Local epochs for training target network by real samples Tr
5
Local epochs for training target network by synthetic samples Ts
1
Batch size B
64
Table VII
Generative model: DCGAN
All the parameters settings are identical to Table 1
Figure 7
Total dimension
Feat size
HL
Comm round of training common FE
Comm round of training header
1024
1*32*32
0
0
70
768
3*16*16
1
20
50
640
10*8*8
2
20
50
320
20*4*4
3
20
50
160
40*2*2
4
50
20
80
80*1*1
5
60
10
0
0
6
70
0
Homogeneity level (HL) 0 denotes the heterogeneous models without common feature extractor (FE) that only headers of models need to be trained.
HL 6 denotes the homogeneous models where all the models consists of only common FE (i.e., common FE is the whole model).
Figure 8
Models are described as in details on Figure 7.
MND was evaluated as in Table 2.
Figure 10
Real data scale : 0.01, 0.05, 0.1, 0.5, 1
GeFL w/ Syn 5: Local epochs for training training target network by synthetic samples Ts=5
GeFL w/ Syn 1: Local epochs for training training target network by synthetic samples Ts=1
Other parameters are identical to Table 1
Table 8
Same as Table 7
Model architecture
In following tables, conv(c,k,p) denotes a 2d convolutional layer, where c is the output channel size, k is the kernel size, and p is the padding size. bn(c) represents a batch normalization layer with c denoting the channel size. The term relu denotes the rectified linear layer, and maxpool(k,s,p) denotes a max-pooling layer. where k is the kernel size, s is the stride, and p is the padding size. Finally, fc(in/out) indicates a fully connected layer, where in is the number of input nodes and out is the number of output nodes.