The most significant component illustrating my algorithms and data structures abilities is the overhaul of experience_replay() function, which was upgraded from taking random samples to taking samples that are prioritized based on their probabilistic value to the learning algorithm. Additionally, various measures were taken to ensure that memory consumption was controlled, including the addition code that constrains the algorithm to prevent the completion of poorly performing models and disruption of models that perform so well that they constitute pseudo-infinite loops. The resulting improvements in performance and memory consumption helped to satisfy software requirements to such a degree that I was also able to implement the RMSprop optimizer instead of the Adam optimizer (this further reduces memory consumption at a minor cost to accuracy).
While I had not initially planned to upgrade the core method of the learning algorithm, upgrades to the learning system were, more generally part of my overall plan. The major upgrade that was performed, along with smaller modifications, reduced memory consumption by a factor of 5 while the learning algorithm simultaneously showed an increase in performance. The process of upgrading the MLDash artifact did not proceed without considerable effort, but artifact development is in perfect alignment with planned timelines (it may also be considered as “ahead of schedule”). Changing an essential part of the learning algorithm required a large amount of research, testing, and error correction. Before the upgrade could have the desired effect, I had to truly understand how it worked so that I could competently modify, correct, and properly integrate the function into the rest of the software code. I feel like I learned a lot, and was surprised by my competence in the implementation of a complex upgrade for a complicated algorithm, the successful reintegration of the code into the MLDash artifact, and the refactoring of the artifact into a more modular form.
After integrating the Cartpole artifact with the Dashboard artifact (hereafter referred to as the MLDash artifact), I found that the reinforcement learning component of the artifact required an inordinate amount of RAM to run to completion. In order to troubleshoot this issue, I initially worked with the pre-integration Cartpole artifact to isolate issues more effectively, then reintegrated the code after resolving the issue. Part of the reason that the code required so much memory (up to 100 GB of swap memory) was that it stores its experiences in the form of q_values (equivalent to model weights); if the model is learning too gradually or not making continuous progress, memory consumption can balloon out of control. The most significant way I addressed this issue was by upgrading the way that the algorithm samples its experiences and learns from them. The original algorithm performed several attempts at a solution, then would learn from a random sample of all of its experience so far. I overhauled the experience_replay() function to perform “prioritized sampling” over a more limited amount of past experience. Prioritized sampling compares past experiences to the model prediction of what would happen. Experiences that deviate from prediction the most are emphasized more in teaching the model. I also hard-coded limitations on how many attempts at a solution the model could make before terminating the program. Making the model smarter caused a different problem, which involved the Cartpole agent succedding for hundreds of thousands of steps (a condition that also caused memory overrun). This was addressed by introducing more noise into the model’s action space (an increase in minimum exploration factor) and a hard-coded limit in the number of steps a run could consist of before terminating. The resulting gains in accuracy and reduction in memory consumption allowed me to also justify changing the model optimizer from Adam to RMSprop which consumes about a third of the memory at the expense of accuracy (due to the different ways that the two optimizers store moving averages). Ultimately, I solved the memory consumption issue by making the model smarter and more intolerant of substandard configurations. By changing the algorithm to learn from important experiences instead of random ones, placing emphasis on more recent experiences, and applying additional win/loss conditions, I improved performance and resource consumption (now the model can run on a computer with 16GB of RAM using little or no virtual memory). Using certain hyper-parameters can still drive up memory usage significantly, but hard limits placed on the number of runs and number of steps per run effectively prevent memory runaway conditions.
After implementing the Cartpole upgrade and adding a significant amount of documentation in the form of comments, I reintegrated the Cartpole artifact into the Dashboard artifact, resulting in the upgraded MLDash artifact. I then refactored the MLDash artifact to be more modular by converting major parts of the code into classes and calling instances of classes and their member functions where needed. I also worked on preparing the MLDash artifact for full database interactivity by ensuring that model metrics were stored in a format compatible with database write functions, which were also reviewed and polished. While full functionality is not expected until the database upgrade is complete, the development of the MLDash artifact is progressing smoothly. This has required a very large amount of effort, but this seems to have been worth the trouble.
This site was built using GitHub Pages You can find the source code for Jekyll at GitHub