Lessons Learned & Discussion - Publicly Verifiable & Private Collaborative ML Model Training

Lessons learned, discussion, and final words

Over the course of this work, we have learned a lot about Noir, how to optimize it, logistic regression and the co-noir tool. In this final section we'll share our main lessons learned and some thoughts about the delivered library.

This work wouldn't have been the same without the use of the profiler tool, which we discussed in the Optimizations section. We've mentioned this to the Noir team as well, and hope it will soon be documented and recommended to all Noir developers. Although it doesn't do the optimization work for you, it points you in the right and most impactful direction. We've experienced this to be extremely helpful and it has helped us learn a lot about how to write more optimized Noir code.

The second lesson that stands out was the interaction and collaboration with the team that is building co-noir: Taceo. Apart from figuring out how to build this library, we had to be strategic and make sure we kept in close contact with the co-noir team in order for all the functionality to be supported. If you are planning to build something that will be run with co-noir, we highly recommend asking questions in the Taceo Discord because the team is very approachable and pro-active. This dynamic helped us create a co-noir supported version and prevented us from going down rabbit holes that would not work for our goal (for example the use of recursion, which we will discuss in a bit).

Finally, during this project we experienced how the quest for optimizations can lead to underconstrained circuits. It was extremely helpful to have the Noir team take a look at our code and point out where we went wrong. This lesson is very valuable for future endeavors in Noir, as we will be much more cautious with optimizing our code.

Moving on from the lessons learned, we want to highlight a few limitations of the current system. The above mentioned benchmarks are all dependent on numbers of: samples, features, epochs and target classes. In the collection of datasets the Iris and Wine dataset stem from, there are more datasets we could use but the current functionality has difficulty supporting them. For example, dataset Digits has 10 target classes and Diabetes has 27. This means the training algorithm has to be executed 10 of 27 times, which causes a huge increase in gatecount and execution time and makes it impractical at this moment. Furthermore, increasing the number of epochs or training samples also has a negative impact on the performance. Depending on how much they are increased, either the gatecount will be very high (or the execution time with co-noir very long) or the process will simply be killed because the circuit is too big or the network communication is too much.

Also, we dipped our toes shortly into checking the possibilities for using recursion or folding, which seemed like opportunities for optimization. In recursion it uses proving a proof inside another proof to achieve this, and folding does it by combining multiple instances into a single one. The immediate idea would be to use the fact that the train function is run once per class, completely independently in order to obtain a multiclass output. The idea would be to leverage this and optimize the process using one of the aforementioned techniques. From what we understand folding will be supported from the Aztec system, but not necessarily in vanilla Noir, and as of now it is still a work in progress. We considered trying recursion, but it seems folding will be preferred over recursion (according to Discord thread), and furthermore co-noir isn't planning to support either of them anytime soon.

There are also some other alternatives that were not covered in this project. One of those alternatives is the Softmax method to train multiclass logistic models. The Softmax method uses the Softmax function $σ (z)_{i} = \frac{e ^{z_{i}}}{\sum _{i = 1}^{n} e ^{z_{j}}}$ which is used to estimate the probability that a sample belongs to a certain class. To implement this, we would first need to research what approximation method would work in this case, similarly as how sigmoid is calculated using an approximation. It is important to remark that if it's possible to implement Softmax this could be a noticeable optimization to train multiclass models, because we would replace repeated training per class (as we did for this project) with the use of a single training using Softmax. Also we wanted to test other bigger datasets like the Digits dataset but this was limited by the Noir and co-noir performances.

As an additional exercise, let us imagine what would happen if co-noir gets an improvement in the performance with respect to the current one with a factor of 100-1000x. Increasing the performance in that factor may allow the training of logistic regression models with more samples, more features, and more target classes. An application that could benefit from this assumption is training ML models on image-based datasets. Training a high-quality model requires datasets with a large number of features and samples, which our solution has a practical limitation on as the circuit size gets very large. This expands the application of our techniques to medicine (as explained in the Introduction) that require models trained on X-ray, echography, and MRI images, which have many features and require training on a large amount of samples. Another example of datasets with many features are applications related to credit behavior (as in the South German Credit dataset) where also privacy and public auditability are desired features. Finally, the circuit also gets larger as the number of target classes increases, which is also a practical limitation. The speed up can open the path to train models with large number of target classes (as presented in the Student Performance dataset) which opens the possibility to train models on private demographic data.

We'd like to part ways by giving some ideas for future directions and of course thanking you for sticking around until the end! An interesting improvement could be made if a parallelizable for-loop were possible, similar to @for_range_opt in the library MP-SPDZ. This would make it possible to execute the training for each target class in parallel and make for a big performance improvement. Of course, this can be implemented in a frontend which calls the training in separate threads, but it would be a nice-to-have in Noir itself. We opened an enhancement feature request for this in Noir here. Another thing we will keep our eyes out for are the other MPC protocols that will be supported by co-noir. As discussed in the introduction, currently they (and thus we) have been working with Rep3 and Shamir secret sharing, with 3 parties. If the developments during this work are any indication, we're sure there will be more features added soon to the co-noir tool, and those would be very interesting to try out.

Finally, this work is a building block of machine learning, which we achieved by using the approximation of the sigmoid function. Without this, we are unsure whether implementing the functionality would have been possible. This opens up the possibility of trying to implement other, potentially complex, building blocks for ML with similar approximation algorithms. This could be focused both on vanilla Noir or again in the context of co-noir.

Once again, we'd like to thank Aztec Labs for sponsoring this work through the grant for NRG #2 and their valuable feedback. We also want to thank Taceo team for their development of the co-noir tool and their help with every question we had. Keep in touch with our team at HashCloak via our website and follow our updates on X/Twitter.