Improve delivery of data driven projects from development all the way to production
Applying CI/CD methodologies in an environment using R and OpenCPU
In data driven development, value lies in the data beneath the software and is often exempt from traditional software development schemes such as Agile. As those that work in these areas usually don’t have a formal education in good software practices, it’s easy to miss out on all the benefits it brings to the table. So how do we build quality into data driven software using modern software processes and tools?
In a nutshell, Jenkins is an advanced scheduler. It triggers tasks on connected hardware by reacting to events, typically on source code changes or time events. Combining Jenkins with the following tools allows you to introduce and automate tasks, drastically increasing productivity and quality. Interested?
We’ll base the environment on R, one of the most popular programming languages among data analysts. It’s functional and dynamically typed, with heavy emphasis on data manipulation and projection of large datasets.
The aptly named
Rapache is a variant of Apache tailored to R. It’s a static file server used to host binary R packages using the CRAN format.
OpenCPU API server is another popular tool that can map R function calls to simple REST methods.
In summary, R has all the building blocks required to build a modern software pipeline:
We want to increase the awareness of code quality. By automatically gathering and displaying metrics, we introduce code quality and testability into the daily life of the analyst. This is uncommon in the art of data science, since you cannot test your data directly.
We want to increase the speed at which our deliveries reach production. Jenkins can automate the tedious tasks of uploading packages and restarting the server after installation, if necessary.
We want to test wheather our packages can be deployed successfully. By scripting the installation, it becomes trivial to automate a deployment to a production-like environment using Jenkins, even for every single change to the package.
We want to test our deployed package before it reaches production. This is now very easy, as we can now automatically deploy to a test environment and make real calls to our updated package.
So here’s what we delegate to Jenkins:
To increase awareness of quality of code, two types of analysis can be applied:
linter, or syntax checker, which checks for irregularities in code, e.g. incorrect indentation, incorrect variable naming or long lines of code. A good example is Lintr.
Check your code for
FIXME or similar comments, and display those using the warnings plugin
Generate warnings for these during your builds to bring issues to light and motivate fixing and preventing further warnings.
Use the testthat R package for unit testing. It’s a great way to test your software, as the tests are written in a very declarative way. Use testthat’s built-in feature to convert the results to the TAP format and display them in Jenkins with the tap plugin.
Code coverage measures how much of your code is used during tests. Use the Covr R package to measure coverage. It can output its results into the Cobertura coverage format, which allows the Cobertura plugin to displays the results in Jenkins as pretty graphs.
The next part, deploying to a test environment, is solved with the Pipeline job type in Jenkins. Pipeline makes it easy to transfer the built package between steps with it’s
stash function. This makes it a great choice, as it trivializes transferring the package from the test R server to a test OpenCPU server.
After deploying to the test environment, we can run the functional tests that interact directly with our API server (OpenCPU), the functional tests are also written using
If the functional tests pass, we have a valid release candidate. Now we can publish our package to the local package server and CRAN package mirror, to be installed on the production server whenever we choose to.
Putting all this together, we roughly end up with the following flow:
Analysis > Unit tests > Deploy to test > Integration tests > Generate documentation > Create release candidate
To store our job configuration as code, and make it easy to spin up this pipeline, we created all of the jobs using the Job DSL plugin, and put the scripts in our repository.
The overall goal of improving quality of software is assisted by Jenkins. We use Jenkins as the driver to showcase many of the concepts normally used in software development.
The developers now have a visual representation of the state of their software, the quality of the tests and the current progress is tracked on a dashboard for all to see, this makes it easier to demonstrate progress to upper management from a developer standpoint.
Do you have a tendency to use the backlog as an eternal placeholder? If so, you probably have a lot of clutter that’s creating a lot of frustrations for your end-users. In this post we’ll show you how to clean up your Jira issues and reduce the backlog with some basic JQL queries.
Tips to improve project management in the Atlassian suite
How to test Kubernetes artifacts like Helm charts and YAML manifests in your CI pipelines with a low-overhead, on-demand Kubernetes cluster deployed with KIND - Kubernetes in Docker.
Low overhead, on-demand Kubernetes clusters deployed on CI Workers Nodes with KIND
Had enough of sluggish polling? With instant Artifactory event triggers you can give responsiveness in Jenkins a real boost. Here’s an easy way to set it up.
A super easy configuration guide
With the arrival of microservices code is becoming disposable. Does this mean that we no longer need maintainable code? Is it the end of refactoring?
Still relevant or increasingly redundant?
In software development tight coupling is one of our biggest enemies. On the function level it makes our application hard to change and fragile. Unfortunately, tight coupling is like the entropy of software development, so we have always have to be working to reduce it.
How to safely introduce modular architecture to legacy software
I am an Atlassian certified trainer and over the years I have been spending much time with clients and their Jiras. In this blogpost, I have collected some small tips and tricks that will make your Jira usage better.
Jira Software is a powerful tool deployed in so many organizations, yet in day to day usage people are missing out on improvements, big and small.
In this post, I’ll take a closer look at the version of Jenkins X using Tekton, to give you an idea of how the general development, build, test, deploy flow looks like with Jenkins X. How does it feel to ship your code to production using a product coming from the Jenkins community that has very little Jenkins in it?
A crash course in Jenkins X and how to test it out on a local Kubernetes cluster
In this blog I will show you how to create snapshots of Persistent volumes in Kubernetes clusters and restore them again by only talking to the api server. This can be useful for either backups or when scaling stateful applications that need “startup data”.
Sneak peak at CSI Volume snapshotting Alpha feature
When I read Fowler’s new ‘Refactoring’ book I felt sure the example from the first chapter would make a good Code Kata. However, he didn’t include the code for the test cases. I can fix that!
Writing tests for ‘Theatrical Players’
Nicole Forsgren and the Accelerate DORA team has just released the newest iteration of the State of DevOps report. The report investigates what practices make us better at delivering valuable software to our users as measured by business outcomes. Read on for our analysis of the report, and how it can be best put to use.
The latest drivers of software delivery performance
Hear about upcoming events in Scandinavia, latest tech blogs, and training in the field of Continuous Delivery and DevOps