MLDash

June 11, 2023

Since the last update for the MLDash artifact, upgrades have been made to improve security. A security class was created in the security.py file, with member functions capable of encoding and decoding the arguments used to initialize a security object. The original code stored the MongoDB username and password by assigning cleartext string constants to variables, which were then used to create a MongoDB CRUD (create, read, update, delete) capable object that connected to the database using fixed credentials. This meant that database setup (using Mongo Shell) required the creation of an authorized user that was limited to a single possible username and password. Now, there is no need to expose the credentials in the source code, and database setup can use any username and password combination.

The MLDash interface has been modified to present the user with a login screen, which must be successfully authenticated to before the main content is revealed. A user enters their MongoDB username and password, and presses the “Login” button. The credentials are passed to a security object instance, which XORs the credentials with a randomly generated binary key which is truncated to the respective lengths of binary conversion of the username and password, and returns the encoded credentials in hexadecimal form. The encoded credentials and security object are then used as arguments in the creation of a new MongoDB CRUD-capable object. The MLMongo class, defined in the crud.py file, uses the security object to decode the encoded credentials and use them in the connection string for MongoDB. Only if authentication is successful does the login screen display the dashboard content. This security solution prevents unauthorized use of the dashboard, allows customization of username and password during database setup, prevents any cleartext record of the credentials in the code itself, and performs encoding and decoding of credentials with a unique randomly generated variable-length XOR key that does not persist beyond a single login session. While the encoding algorithm itself may be upgraded in the future to a more secure algorithm such as AES-256, this use of a one-time key greatly enhances XOR security and makes brute force attacks on the authentication system far less likely to succeed.

Throughout the code, several try-except blocks were added, and the potential risk of division-by-zero conditions (during the creation of chart trendlines) was mitigated through the use of a python function that drops rows with values of 0 in selected columns. Additionally, care was made to cast string-concatenated integers and float values to the string type before attempting to display them for users. Type casting issues were also avoided during chart creation, through the addition of code that strips the first row (header, column names with string type values) from the data frame before trendline creation mechanisms attempt to perform numerical operations on the data.

New interface features were also added, including an indicator showing the number of GPUs available for use (automatically used by tensorflow), and a set of radio buttons that allows a user to disable or enable the pygame animation that runs concurrently with the reinforcement learning training session. The code for the radio buttons used to filter what data is displayed in the datatable and data visualization charts was also amended to refelect changes in variable names and implementation errors.

Additional documentation for the database setup was added to the README.txt file. User setup involves creating an admin user which can then set the regular user’s role to Read/Write (only) to prevent unauthorized access to other databases or rights management; the admin user also specifies the database authentication mechanism to apply SCRAM-SHA-256 to the regular user’s authentication process (also represented in the MLMongo class database connection string). In order to implement authentication-dependent dashboard content, the app.layout value assignment was changed to the value of a function containing the login screen interface code. The main content was encapsulated in another function, which is executed within a callback function connected to the “Login” button (if and only if authentication is successful).

Attempts were made to upgrade the project to use Python 3.11.x, but the particularities of the changed dependency tree forced me to table the upgrade for a later date. When testing MLDash with maximally upgraded dependencies, an unknown issue caused the Jupyter Notebook kernel to die at unpredictable times. Considering that upgrading from Python 3.7 to 3.11 caused MLDash to run almost twice as fast, this upgrade will definitely be revisited in the future.

The final artifact combines several technologies, each of which posses functionalities which are only partially known to me. In order to get so many moving pieces to work together as intended, I had to learn a significant amount about the particulars of various code elements in a short period of time. Making the code more modular definitely helped me to isolate issues as they arose. One major obstacle I encountered was during testing and population of the database with multiple sessions. In order to limit testing duration to a reasonable amount of time, I was forced to reduce the quantitative criteria used to judge whether the learning model solved the Cartpole problem; otherwise, sufficient testing could literally take weeks, due to the large amount of time needed for each training session to complete and the probability of each training session to be successful (and being subsequently written to the database). I definitely tried making certain changes that I was not yet competent in performing and had to reverse these changes to undo the harm I had caused to the overall operation of the program; however, these experiences taught me a lot and make it more likely that I will be more successful on similar tasks in my future endeavors.