Results of NeFL with five submodels and SOTA baselines on five datasets under IID settings. We report test performance including Top-1 classification accuracies (%) for the worst-case submodel and the average performance over five submodels.

Model Method CIFAR-10 CIFAR-100 CINIC-10 SVHN
WorstAvgWorstAvgWorstAvgWorstAvg
ResNet18 HeteroFL80.6284.2641.3347.0967.5570.4091.8293.46
FjORD85.1287.3249.2952.6771.9574.9894.3193.97
DepthFL64.8082.4431.6849.5654.5171.4291.5493.97
ScaleFL79.4785.1841.0049.7670.5573.8593.1594.53
NeFL(ours)87.7189.0255.2256.2675.0276.6894.7295.22
ResNet34 HeteroFL79.5183.1634.9639.7567.3969.6289.8692.39
FjORD85.1287.3647.5950.771.5874.1993.8394.63
DepthFL25.7375.3014.5146.7932.0567.0474.3389.96
ScaleFL54.7281.0522.6246.4149.6969.4386.4693.21
NeFL(ours)87.7189.0255.2256.2675.0276.6894.7295.22

Results of NeFL utilizing pre-trained models as initial weights for CIFAR-10 dataset under IID (left) and non-IID (right) settings. Numbers in parentheses denote the performance difference compared to scratch, with blue indicating improvement from pretraining, and red indicating degradation.

Model Method IID non-IID
WorstAvgWorstAvg
Pre-trained
ResNet18
HeteroFL78.26(↓2.36)84.48(↑0.22)71.95(↓4.30)76.17(↓3.94)
FjORD86.37(↑1.25)88.91(↑1.59)81.81(↑6.00)81.96(↑3.97)
DepthFL47.76(↓17.04)82.86(↑0.42)39.78(↓19.83)67.71(↓9.18)
ScaleFL79.34(↓0.13)86.16(↑0.98)69.47(↑6.00)78.01(↓0.48)
NeFL(ours)88.61(↑1.75)89.60(↑1.72)82.91(↑1.65)85.85(↑4.14)
Pre-trained
ResNet34
HeteroFL79.97(↑0.46)84.34(↑1.18)72.33(↓3.70)78.20(↓1.43)
FjORD87.08(↑1.96)89.37(↑2.01)78.20(↑3.50)78.90(↑2.89)
DepthFL52.08(↑26.35)83.63(↑8.33)42.09(↑11.67)79.86(↑9.10)
ScaleFL67.66(↑12.94)85.77(↑4.72)52.59(↑20.25)78.29(↑5.89)
NeFL(ours)88.36(↑0.65)91.14(↑2.12)83.62(↑2.86)86.48(↑3.18)