At BizStream, we’ve come a long way in our effort to reduce bugs and repair them quickly when they happen. We have many processes in place to prevent bugs, such as our software development process, automated testing, QA personnel, and error logging software. In this blog post, I’ll break down the many different ways we care about delivering quality software.
The most cost-efficient way to fix a bug is by preventing it in the first place. Typically, the least expensive bug to fix is the bug that is found during requirements gathering. Conversely, the most expensive bug to fix is found on a live production site. In this section, I’ll walk through the many ways we prevent bugs from occurring.
The best way to prevent bugs is by following a proven software development process like Agile Scrum. At BizStream, we follow the Agile Scrum steps on every project. These steps include:
- Sprint Planning: Plan out the work that will be done in the next 1 to 4 weeks.
- Daily Scrum: Discuss what you worked on yesterday, what you’ll work on today, and anything blocking your progress. This is a good time to discuss any potential requirements issues or problems with the team. The product owner should be present to answer any requirements-based questions.
- Backlog Grooming: Discuss and estimate tasks as a team before starting work. The development team and product owner flush out any remaining questions before a task is assigned to a developer. This meeting is the first line of defense against bugs.
- Retrospective: Review what went well and what can be improved. If a lot of bugs were released after a sprint finishes, this is the time to figure out what went wrong and make some changes.
Visual Studio IDE and Code Editor
Allows developers to step through code, line by line, to see how it runs. Values can be changed in real-time to simulate possible issues and test out different scenarios.
Visual Studio Code Editor
Developers can add many extensions to help them out, like spell checkers and code completion tools.
Resharper - a Visual Studio extension
Resharper suggests code refactorings that are more performant and less error-prone.
After a developer is assigned a task, they will begin the work on their local computer with an exact copy of the live site you’ll see in your own browser. This gives them the chance to make changes without affecting the production site. We can try new things and test them out on our own without affecting anyone else. Once we’re done making our changes, we make sure the site builds without any errors and then test out the new feature on our own PC. You might think this is standard practice in the industry, but some developers at other companies are changing code on live sites without testing it first, yikes!
Unit tests are automated tests that developers write to test out the smallest chunks of code. Each small piece of code has many small automated tests that are run against it to ensure that each piece of the whole is working correctly. It’s good practice to write tests that cover every possible outcome of each code block and to add new tests after each bug is found and fixed. These tests flush out bugs quickly because developers can run them in large batches after every code change they make. This gives us the confidence we need to change the code without injecting new bugs.
Self Code Reviews
Once we’re done writing our code and testing it locally we make sure the code looks good by proofreading it again before we commit it to the repository. I always review my own code before I commit it, and then I review it again after I submit it for peer review. I’ve found many issues by reviewing my code twice.
Peer Code Reviews
Once a developer finishes a task, they create and assign a code review to all other developers on the project team. Team members read the code changes and make comments on any line they choose, just like we can in a Microsoft Word document. Often times we’ll suggest re-using an already existing and tested piece of code instead of writing something new. Or, we’ll simply point out spelling errors or places where the code is likely to fail if an error isn’t handled properly.
Automated Front-end Testing and Site Crawlers
We also set up a “Crawler” on each site that runs every 24 hours. The crawler provides a report of all the URLs tested along with any errors and warnings that occur. This allows us to track down any bugs we might have created in the previous deployment.
BizStream has a dedicated team of QA professionals. After peer code reviews are finished, the same task is assigned to our QA team. A QA team member will beat it up and find those edge cases we didn’t think about during our developer testing. It’s always good to have a second set of eyes re-read a task’s acceptance criteria and make sure the work is done correctly.
We use Azure DevOps to automate our deployments. Manual deployments are risky, stressful, and time-consuming. When I performed manual deployments in the past, they sometimes took 2 to 4 hours and I had to follow a complicated process step by step or risk taking down a live site. An automated deployment can take 5 minutes or less to complete and can be set up to run at any time without a developer being present. Many live site problems are caused by failed deployments performed by humans. A developer may forget to copy a file over to a production server or forget to run a query to update a database. Sure, there may still be issues every now and then with automated deployments, but when they do arise, we fix the automation problem instead of the deployment problem, and it’s often easier to automate the rollback process, which reduces the overall downtime.
Even though our deployments are automated at BizStream, we still watch every live deployment to make sure it runs smoothly. If we need to intervene or restart the process, we’re standing by. Once the code is deployed, the entire team makes sure every new feature is working as expected. We want to be the first to find any new bugs if they are out there.
We utilize multiple different sites for testing before we move our changes to a live production site. These are the usual sites:
- Local: a local environment used for development on a developer’s PC.
- Development: A site that is accessible by all developers and the QA resource on the team. This is the first site where we see our changes live after a deployment. This site is usually private and can’t be accessed by anyone outside the development team.
- QA: This environment is typically used by the QA team members. It’s their site to test on and add new “fake” content to test those edge cases.
- Beta or Staging: This environment is typically used by our clients for adding new content and testing out new development features. This is the first time a client will see and test out features for themselves. This is the last stop a feature makes before going to a live production site. It’s very important that clients assist us in flushing out bugs before their users find them.
Now that we’ve covered some good bug prevention steps, we’ll talk about how we find live bugs and record them, so our clients and users don’t have to.
The Kentico Xperience Content Management System (CMS) has its own built-in event log. It’s good to skim through the log on a regular basis to make sure a site is running well. Errors are conveniently highlighted in red with dates, custom error messages, and sometimes even the exact line number where the bug occurred in the code. Our YouthCenter and CaseStream products utilize their own custom event logs that are crucial in helping us figure out what’s causing a bug.
Audit logs are used to determine who created, viewed, edited, or deleted items on a site. The Xperience CMS stores this information in the event log. BizStream’s products implement their own version of an audit log. They help us figure out the “who, what, when, and where” of a bug before we start troubleshooting.
Windows servers store IIS logs that can be used by developers to figure out issues when they arise when nothing is logged in the Event Log or Audit Log.
Browser Console Errors
If you press your F12 button in your browser and view the console tab, there may be errors listed in red. Sometimes the errors are specific enough that we’ll know exactly what the problem is without doing any further digging.
Error Reporting Software
It’s often helpful to run an automated service that sends some sort of notification message to the development team when errors occur. If needed, we can respond right away to an issue and solve it before more users find it.
Xperience by Kentico
If you’re running the Xperience by Kentico CMS, you can add an e-mail address to the “Error notification e-mail address” setting in the administration part of the site. You will then receive e-mails when errors occur.
BizStream also utilizes RayGun on non-Xperience sites. RayGun is a software product that provides error monitoring, crash reporting, user monitoring, and real-time performance metrics. We have it set up to send us Slack messages whenever errors occur on our sites, so we never miss one.
Toolkit for Kentico
The Xperience CMS provides many different tools out of the box for reporting and logging errors. However, some of our clients want a little more than what Kentico gives them. As a result, BizStream added the Toolkit for Kentico. The Toolkit contains many software tools that enhance the Xperience CMS like Sitewide Search, Compare for Kentico Deployments, and Connect for integrating with CRM tools like ZoHo and Dynamics. However, there are two software packages in the Toolkit that can help reduce issues on your site.
Constant Care automates the review of over 100 points of performance on a Kentico Xperience site and gives easy-to-follow instructions to get a site performing at its fullest.
The Siteimprove add-on is an integration between Siteimprove and Kentico Xperience that streamlines workflow efficiencies for a web team. With this extension installed, a team can fix errors and optimize content directly within the editing environment. Once the detected issues have been assessed, they can be re-checked on the relevant page in real-time to determine if further actions are needed.
Fixing Bugs When They Do Occur
Finally, we need to realize that bugs are unavoidable on live sites. It’s nearly impossible to prevent all of them. The key is to gather all the relevant information from the available sources and fix the issue as fast as possible. The more information we get from our clients and users in our support e-mails, the quicker we can solve the problem. After we collect the information and fix the bug, we follow the same process as outlined above in the Bug Prevention section of this post before updating the live site again with the fix. Ensuring quality is a team effort that starts with prevention, continues with the use of logging and error reporting tools, and finishes with quick response times from everyone on the team when errors do occur.