Blog

Narratives/Blog Posts

Posts

Data Science and Analysis in Python

I am excited to announce a new addition to my portfolio site! DataFlow is here!

MLDash

Capstone Objectives & Outcomes

Databases

The upgraded “MLDash” program accepts user input for learning algorithm hyper-parameter values, runs the learning algorithm, populates a MongoDB database with training hyper-parameters and associated performance metrics, then reads the data into a Pandas data frame and displays it in a browser accessible dashboard application as a data table and data visualization charts. Some of the charts include trend lines that are fit to the data, and additional code was included for these charts to remove data from the data frame that would cause issues with these calculation of such, such as rows with missing data or values that would cause a type error or divide-by-zero condition. The data in the data table is subject to user initiated filtering, which causes dynamic updates to the data table and data visualization artifacts. The charts are also interactive, having zoom and hover functionality for more articulated data examination. A dynamically updating dashboard label updates to show what the latest session number in the database is, so users can more easily look up data for the last session in the data table.

MLDash

Since the last update for the MLDash artifact, upgrades have been made to improve security. A security class was created in the security.py file, with member functions capable of encoding and decoding the arguments used to initialize a security object. The original code stored the MongoDB username and password by assigning cleartext string constants to variables, which were then used to create a MongoDB CRUD (create, read, update, delete) capable object that connected to the database using fixed credentials. This meant that database setup (using Mongo Shell) required the creation of an authorized user that was limited to a single possible username and password. Now, there is no need to expose the credentials in the source code, and database setup can use any username and password combination.

Algorithms and Data Structures

The most significant component illustrating my algorithms and data structures abilities is the overhaul of experience_replay() function, which was upgraded from taking random samples to taking samples that are prioritized based on their probabilistic value to the learning algorithm. Additionally, various measures were taken to ensure that memory consumption was controlled, including the addition code that constrains the algorithm to prevent the completion of poorly performing models and disruption of models that perform so well that they constitute pseudo-infinite loops. The resulting improvements in performance and memory consumption helped to satisfy software requirements to such a degree that I was also able to implement the RMSprop optimizer instead of the Adam optimizer (this further reduces memory consumption at a minor cost to accuracy).

Software Design and Engineering

I went through the code for the Dashboard artifact (crud.py and Dashboard.ipynb) and updated the names of variables and functions to reflect the context of the new use case. I modified the MongoDB connection string to use a new database and require a user name and password, but rolled this change back (temporarily) to reduce the complexity of the rest of the development process; I have documented the secure connection string for later use. I modified the code in Dashboard.ipynb to display an appropriate logo, and I added labeled text boxes that accept user input, passed to the backend code when the user clicks on a submit button. A callback function then stores the user input in a collection of variables. Having integrated the Cartpole artifact code into the Dashboard artifact (broken into an imported class file and a modified cartpole() function), the collection of user input variables are passed as arguments to the cartpole() function. These arguments represent hyperparameters of the underlying deep-Q reinforcement learning algorithm, which can be tuned to improve model performance. The code in score_logger.py, was modified to write performance relevant data into a file named metrics.csv, along with all of the hyperparameters submitted by the user. After each training session this data can be written to the MongoDB database using the write() or writes() functions, which I added to the crud.py module. Currently, from Jupyter Notebook, running the Dashboard.ipynb file displays an unpopulated data table, and the relevant chart functionality needs to be added. When a user enters hyperparameter values and clicks on the “Submit” button, it initiates the training of the cartpole learning algorithm.

What Makes for a Good Code Review?

In software development, code review is the process of evaluating the source code of a software project to detect bugs, security vulnerabilities, coding errors, and to ensure that the code adheres to the established coding standards and best practices. In general, code reviews involve inspecting code to ensure it adheres to best practices, coding standards, and security controls. CodeProject states that, “Code review is systematic examination (often known as peer review) of computer source code. It is intended to find and fix mistakes overlooked in the initial development phase, improving both the overall quality of software and the developers’ skills”(Ludovicianul, 2013). During the code review process, the code is typically examined by other developers, who provide feedback and suggestions for improvement. Code review allows for improvements in code organization, code correctness, error handling, security, maintainability, efficiency, scalability, and performance. When creating code for software, it is pointless to exert a lot of effort to produce something that does not work as intended, or which works but is packaged with a lot of unanticipated liabilities. Reviewing the code I write, and integrating the feedback of others, has shown itself to be an essential part of producing quality works.

SNHU Computer Science Capstone