Databases

June 14, 2023

The upgraded “MLDash” program accepts user input for learning algorithm hyper-parameter values, runs the learning algorithm, populates a MongoDB database with training hyper-parameters and associated performance metrics, then reads the data into a Pandas data frame and displays it in a browser accessible dashboard application as a data table and data visualization charts. Some of the charts include trend lines that are fit to the data, and additional code was included for these charts to remove data from the data frame that would cause issues with these calculation of such, such as rows with missing data or values that would cause a type error or divide-by-zero condition. The data in the data table is subject to user initiated filtering, which causes dynamic updates to the data table and data visualization artifacts. The charts are also interactive, having zoom and hover functionality for more articulated data examination. A dynamically updating dashboard label updates to show what the latest session number in the database is, so users can more easily look up data for the last session in the data table.

I have completed most, but not all, of the originally planned changes for my Capstone artifact, but have finished the most essential upgrades and transformations, as well as other unforeseen (but necessary) changes. Since the last update, I have enabled functionality that allows learning algorithm hyper-parameters and performance metrics to be written to the MongoDB database. This data now appears in the dashboard data table, and is charted to several data visualization graphs which were also added. The database schema and code logic were upgraded to differentiate between distinct training sessions, the data for which remains persistent, even after machine reboot (data from all past sessions where the program successfully solved the problem are available on “first run”). Data frame filtering functionality and labeling were upgraded to reflect meaningful relationships between the data within a single training session and also between data from different training sessions. The dashboard interface and back-end functionality was updated to allow the user to clear both the metrics and summary database collections. Default values for hyper-parameter settings were changed to promote a higher likelihood of successful training for new users. The database connection string was reconfigured to make use of password authentication using SCRAM-SHA-256 for a more secure database connection. Additional documentation for MongoDB was added to the README.txt file, to make it easier for users to set up the necessary database structure, create authorized users, set user roles, and specify authentication mechanisms. Additional inline comments were added to the code to make it simpler for other people to understand, and a general overview of the purpose and function of the program was added to the top of Dashboard.ipynb. Overall, changes were made that allowed the artifact to better interface with a self-populating database system, secure authentication was added, and the database information in presented in way that is more user-friendly.

While I have met all of the most important expectations I initially set for the final artifact, there are still a limited number of enhancements that I would like to see added before the end of the course, if possible. While not essential, I would like to see the authentication parameters for database connectivity become an interactive component of the user interface, instead of existing as stored strings. If this upgrade were to occur, I would also make use of XOR functionality or hashing to obscure login credentials. Also, while I successfully enabled writing to the database, reading from the database, data filtering, and data visualization, the products of this success could still use polishing before being considered representative of truly superior work. I would also like to add button functionality that saves the whole database to a csv file so that sample data can allow users to immediately make use of more global comparisons between successful training sessions without having to wait during the time consuming process of generating the data themselves (and also to enable even greater data persistence). For now, the data can be exported to csv, and a csv can be imported to the database using Mongo shell (CLI) or MongoDB (GUI) Compass (an example csv file of this kind will be included in the artifact submission for your convenience, named “aidb.csv”). It would also be helpful to users to add functionality that allows users to delete all data connected to a single training session, rather that the entire database. Some additional code refactoring could also take place to make the code more presentable, especially for the HTML that represents the user interface. Finally, I can convert the entire program into a Docker file to improve portability and ease of installation, but I am not committed to doing so (in the short term) if time does not allow for it. All of these potential improvements are not vital, but would increase the value of the artifact, as a demonstrative example of skill, to potential employers.

This site was built using GitHub Pages You can find the source code for Jekyll at GitHub