Training details

  • Trained by communication rounds of 500 (100 for SVHN)
  • 100 clients, fraction rate of 0.1 each round (i.e., 10 clients each round)
  • Local batch size of 32 and local epochs of 5
  • Optimizer: SGD with learning rate of 0.1 without momentum and weight decay
  • Learning rate decreased by a factor of 0.1 at halfway point and 3/4 of the total communication rounds

  • System heterogeneity: each client trained one of the submodels in each iteration.
    • One with five submodels (Ns=5, where γ = [γ1, γ2, γ3, γ4, γ5] = [0.2, 0.4, 0.6, 0.8, 1]).
      • The clients were evenly distributed across tiers corresponding to the number of submodels.
      • A client in tier x selects a submodel uniformly from the range [max(γ1, γx-2), min(γx+2, γ5)] during each iteration due to dynamically varying system availability.
  • Statistical heterogeneity: label distribution skew following the Dirichlet distribution with a concentration parameter of 0.5.

Details on architectures of submodels

Please note that the widthwise scaling (𝛾W) is uniformly applied across all blocks.

Consider Model index 1 and 2 in NeFL-D on ResNet18. The architecture is illustrated as follows:

drawing

ResNet18

Details of 𝛾 of NeFL-D on ResNet18

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-D (ResNet18)
Layer 1 (64) Layer 2 (128) Layer3 (256) Layer 4 (512)
10.2010.201,10,01,10,0
20.3810.381,00,01,01,0
30.5710.571,11,11,11,0
40.8110.811,01,10,01,1
51111,11,11,11,1


Details of 𝛾 of NeFL-WD on ResNet18

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-WD (ResNet18)
Layer 1 (64) Layer 2 (128) Layer3 (256) Layer 4 (512)
10.200.340.581,11,11,11,0
20.40.411,11,11,11,1
30.60.611,11,11,11,1
40.80.811,11,11,11,1
51111,11,11,11,1

ResNet34

Details of 𝛾 of NeFL-D on ResNet34

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-D (ResNet34)
Layer 1 (64) Layer 2 (128) Layer 3 (256) Layer 4 (512)
10.2310.231,0,01,0,0,01,0,0,0,0,01,0,0
20.3910.391,1,11,1,1,11,1,0,0,0,11,0,0
30.6110.611,1,11,1,1,11,1,0,0,0,11,0,1
40.8110.811,1,11,0,0,11,1,0,0,0,11,1,1
51111,1,11,1,1,11,1,1,1,1,11,1,1


Details of 𝛾 of NeFL-WD on ResNet34

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-WD (ResNet34)
Layer 1 (64) Layer 2 (128) Layer 3 (256) Layer 4 (512)
10.200.380.531,1,11,0,0,11,0,0,0,0,11,0,1
20.400.630.641,1,11,0,0,11,1,1,0,0,11,0,1
30.600.770.781,1,11,1,1,11,1,1,1,0,11,0,1
40.800.900.891,1,11,1,1,11,1,1,0,0,11,1,1
51111,1,11,1,1,11,1,1,1,1,11,1,1

ResNet56

Details of 𝛾 of NeFL-D on ResNet56

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-D (ResNet56)
Layer 1 (16) Layer 2 (32) Layer 3 (64)
10.210.21, 1, 0, 0, 0, 0, 0, 0, 01, 1, 0, 0, 0, 0, 0, 0, 01, 1, 0, 0, 0, 0, 0, 0, 0
20.410.41, 1, 1, 0, 0, 0, 0, 0, 01, 1, 1, 0, 0, 0, 0, 0, 01, 1, 1, 1, 0, 0, 0, 0, 0
30.610.61, 1, 1, 1, 0, 0, 0, 0, 01, 1, 1, 1, 0, 0, 0, 0, 01, 1, 1, 1, 1, 1, 0, 0, 0
40.810.81, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 01, 1, 1, 1, 1, 1, 1, 0, 0
51111, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1

Details of 𝛾 of NeFL-WD on ResNet56

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-WD (ResNet56)
Layer 1 (16) Layer 2 (32) Layer 3 (64)
10.20.460.431, 1, 1, 1, 0, 0, 0, 0, 01, 1, 1, 1, 0, 0, 0, 0, 01, 1, 1, 1, 0, 0, 0, 0, 0
20.40.610.661, 1, 1, 1, 1, 1, 0, 0, 01, 1, 1, 1, 1, 1, 0, 0, 01, 1, 1, 1, 1, 1, 0, 0, 0
30.60.770.771, 1, 1, 1, 1, 1, 1, 0, 01, 1, 1, 1, 1, 1, 1, 0, 01, 1, 1, 1, 1, 1, 1, 0, 0
40.80.90891, 1, 1, 1, 1, 1, 1, 1, 01, 1, 1, 1, 1, 1, 1, 1, 01, 1, 1, 1, 1, 1, 1, 1, 0
51111, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1

ReNet110

Details of 𝛾 of NeFL-D on ResNet110

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-D (ResNet110)
Layer 1 (16) Layer 2 (32) Layer 3 (64)
10.210.201, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 01, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 01, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
20.410.401, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 01, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 01, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
30.610.601, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 01, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 01, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
40.810.801, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 01, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 01, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0
51111, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

Details of 𝛾 of NeFL-WD on ResNet110

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-WD (ResNet110)
Layer 1 (16) Layer 2 (32) Layer 3 (64)
10.20.460.441, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
20.40.600.661, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1
30.60.770.771, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1
40.80.900.891, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1
51111, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

Wide ResNet

Details of 𝛾 of NeFL on Wide ResNet101_2

Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-D (Wide ResNet101_2)
Layer 1 (128) Layer 2 (256) Layer 3 (512) Layer 4 (1024)
10.510.511, 1, 11, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 01, 1, 0
20.7510.751, 1, 11, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 01, 1, 1
31111, 1, 11, 1, 1, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 11, 1, 1

ViT-B/16

Details of $\bm{\gamma}$ of NeFL-D and NeFL-W on ViT-B/16
Model
index
Model size
𝛾
𝛾W 𝛾D NeFL-D (ViT-B/16)
Block
10.510.501, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0
20.7510.751, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0
31111, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

Pre-trained models (trained on ImageNet-1k)

ResNet18/34

  • Trained by epochs of 90, batch size of 32
  • Optimizer: SGD with learning rate of 0.1, momentum of 0.9, and weight decay of 0.0001
  • Learning rate decreased by a factor of 0.1 every 30 epochs

Wide ResNet101_2

  • Trained by epochs of 90, batch size of 32
  • Optimizer: SGD with learning rate of 0.1, momentum of 0.9, and weight decay of 0.0001
  • Learning scheduler: Cosine learning rate with warming up restarts for 256 epochs

ViT-B/16

  • Trained by epochs of 300, batch size of 512
  • Optimizer: AdamW with learning rate of 0.003 and weight decay of 0.3
  • Learning scheduler: Cosine annealing after linear warmup method with decay of 0.033 for 30 epochs
  • Augmentation:
    • Random augmentation
    • Random mixup with alpha=0.2
    • Cutmix with alpha=1
    • Repeated augmentation
    • Label smoothing of 0.11
    • Gradient norm clipping to 1
    • Model exponential moving average (EMA)

Comparing Wide ResNet101_2 & ViT-B/16

Training details

  • Trained by communication rounds of 100
  • 10 clients, fraction rate of 1 each round (i.e., 10 clients each round)
  • Local batch size of 32 and local epochs of 1
  • Optimizer: SGD with learning rate of 0.1 without momentum and weight decay
  • Cosine annealing learning rate scheduling with 500 steps of warmup and an initial learning rate of 0.03
  • Input images are resized to a size of 256x256 and randomly cropped to a size of 224x224 with a padding size of 28 28