Any engineer worth their salt knows how crucial it is to make sure the media servers that support your infrastructure are sufficiently tested. And if you’re an engineer evaluating a platform to add real-time audio, video, and chat functionality to your website or app, you’ll want to make sure that that platform is well-tested, also.
At Aircore, the media server behind our infrastructure is Bixby. Here’s how we conduct and analyze Bixby load tests to keep our platform reliable and maintain some of the highest uptime in the real-time social (RTS) business.
Learn more about Aircore's products here, or get started on our Docs site.
Bixby is the core media server backbone for Airtime’s distributed real-time communications framework. It routes audio/visual data to clients as needed and generates raw metrics that are utilized in creating usage and billing reports for third-party applications.
A Bixby Load Test is manually started on Jenkins with a given Bixby version to run against and does the following:
The graphs generated by Jenkins, as shown above, is not user-friendly nor interactive. Although data is displayed, it is difficult to interpret how each Bixby Version changed the attributes probed and to compare the behavior between different Bixby branches. Additionally, Jenkins does not have an intuitive way to navigate between displaying different test types. Lastly, there is currently no way of disregarding or marking a build result as invalid, so adding a functionality that could archive a build on the front end is desired.
Part One: store load test results in a persistent fashion.
Part Two: generate a data visualization that allows user to compare the performance between each Bixby Versions, as well as with historical data.
Part Three: mathematically categorize whether recent builds conform to the trends of previous builds.
Part One: Using AWS RDS to store test results
In order to store test results in a persistent manner, I wrote a python script that is called at the end of a successful Jenkins build to connect to AWS Relational Database Service(AWS RDS) and the most recent build result is appended to the Bixby Load Test table. MariaDB is the underlying database that is run by AWS RDS. Automated daily snapshots of the database is captured and stored for 30 days in case of any uploading failure or data corruption.
The column attributes currently stores the type of test that is being performed, timestamp of when the data is uploaded, Bixby Version that is being built against, four separate CPU usage values (idle, nice, system, user), two network behavior values, one memory available value, the specific Jenkins URL associated with the build, as well as a notes attribute that can contain any additional information about the build.
It was determined that the integration of the data persistence would be at the end of a load test and would utilize the outputted CSV file to update the database.
There are certain builds of the Bixby Load Test, which were identified as less meaningful during data analysis (part 3) and are noted with more information as to why they are disregarded (for now).
Automatic Backups and Database Recovery
Our AWS RDS instance is configured to create an automatic backup each day and can easily be restored through the AWS RDS console. Furthermore, the decision was made to keep the script which outputs a CSV file as a fail-safe, in the event that both the database and its backups cannot be recovered, the data can still be found in individual Jenkins jobs.
Part Two: Data Visualization
The main purpose of a data visualizer for the Bixby Load Test is to be able to see the behavior of the Bixby server’s performance overtime and to compare one specific branch with another.
In order to create an intuitive Bixby load tests visualizer, it should be interactive without overloading the user with controls.
The load test visualization must contain the following capabilities:
Each test must have its own graphs of three (CPU, network, memory), which totals to twelve graphs that would need to be accounted for
Bokeh is a free and open source data visualization python library, that was chosen as our weapon of choice to tackle this portion of the project due to its dedicated developers that are communicative on Bokeh forums. It also abstracts the process of serving Bokeh content, and does not require any HTML/CSS to display but uses objects that build upon one another to construct the graphs.
Pandas and NumPy were two Python packages that were heavily utilized along with Bokeh to read files and databases, as well as grouping attributes of the database for analysis.
The resulting data visualization is hosted on an internal webpage, called the Bixby Load Test Dashboard, as shown above.
Detecting Failure and Restarting Host
Since the Bixby visualization dashboard is hosted on its own VM, sysctl is used to monitor the Bokeh server and will kill and restart the program if any anomaly is detected. Additionally, a crontab file is created to check whether new changes have been made in the data visualization repository, and automatically pulls all changes to the host and restarts the dashboard.
Part 3: Bixby Load Test Pass/Fail
There are currently no metrics to categorize whether a Bixby build is in line with the recent trend-line and Bixby Load Test results have historically been assessed visually for validity and manually disregarded/archived if results were erroneous.
The process can be laborious and riddled with human subjectivity. In order to automate the analysis of a load test result with historical data, a python script is created to run at the end of a successful Jenkins build. The script parses the outputted load_test.csv that is automatically generated at the end of a Jenkins build and contains the load test data. The historical median is then used to determine whether the most recent run conforms to the trend of previous data.
List of some of the things I did and learned this summer: