Improve delivery of data driven projects from development all the way to production
Applying CI/CD methodologies in an environment using R and OpenCPU
In data driven development, value lies in the data beneath the software and is often exempt from traditional software development schemes such as Agile. As those that work in these areas usually don’t have a formal education in good software practices, it’s easy to miss out on all the benefits it brings to the table. So how do we build quality into data driven software using modern software processes and tools?
In a nutshell, Jenkins is an advanced scheduler. It triggers tasks on connected hardware by reacting to events, typically on source code changes or time events. Combining Jenkins with the following tools allows you to introduce and automate tasks, drastically increasing productivity and quality. Interested?
We’ll base the environment on R, one of the most popular programming languages among data analysts. It’s functional and dynamically typed, with heavy emphasis on data manipulation and projection of large datasets.
The aptly named
Rapache is a variant of Apache tailored to R. It’s a static file server used to host binary R packages using the CRAN format.
OpenCPU API server is another popular tool that can map R function calls to simple REST methods.
In summary, R has all the building blocks required to build a modern software pipeline:
We want to increase the awareness of code quality. By automatically gathering and displaying metrics, we introduce code quality and testability into the daily life of the analyst. This is uncommon in the art of data science, since you cannot test your data directly.
We want to increase the speed at which our deliveries reach production. Jenkins can automate the tedious tasks of uploading packages and restarting the server after installation, if necessary.
We want to test wheather our packages can be deployed successfully. By scripting the installation, it becomes trivial to automate a deployment to a production-like environment using Jenkins, even for every single change to the package.
We want to test our deployed package before it reaches production. This is now very easy, as we can now automatically deploy to a test environment and make real calls to our updated package.
So here’s what we delegate to Jenkins:
To increase awareness of quality of code, two types of analysis can be applied:
linter, or syntax checker, which checks for irregularities in code, e.g. incorrect indentation, incorrect variable naming or long lines of code. A good example is Lintr.
Check your code for
FIXME or similar comments, and display those using the warnings plugin
Generate warnings for these during your builds to bring issues to light and motivate fixing and preventing further warnings.
Use the testthat R package for unit testing. It’s a great way to test your software, as the tests are written in a very declarative way. Use testthat’s built-in feature to convert the results to the TAP format and display them in Jenkins with the tap plugin.
Code coverage measures how much of your code is used during tests. Use the Covr R package to measure coverage. It can output its results into the Cobertura coverage format, which allows the Cobertura plugin to displays the results in Jenkins as pretty graphs.
The next part, deploying to a test environment, is solved with the Pipeline job type in Jenkins. Pipeline makes it easy to transfer the built package between steps with it’s
stash function. This makes it a great choice, as it trivializes transferring the package from the test R server to a test OpenCPU server.
After deploying to the test environment, we can run the functional tests that interact directly with our API server (OpenCPU), the functional tests are also written using
If the functional tests pass, we have a valid release candidate. Now we can publish our package to the local package server and CRAN package mirror, to be installed on the production server whenever we choose to.
Putting all this together, we roughly end up with the following flow:
Analysis > Unit tests > Deploy to test > Integration tests > Generate documentation > Create release candidate
To store our job configuration as code, and make it easy to spin up this pipeline, we created all of the jobs using the Job DSL plugin, and put the scripts in our repository.
The overall goal of improving quality of software is assisted by Jenkins. We use Jenkins as the driver to showcase many of the concepts normally used in software development.
The developers now have a visual representation of the state of their software, the quality of the tests and the current progress is tracked on a dashboard for all to see, this makes it easier to demonstrate progress to upper management from a developer standpoint.
When I read Fowler’s new ‘Refactoring’ book I felt sure the example from the first chapter would make a good Code Kata. However, he didn’t include the code for the test cases. I can fix that!
Writing tests for ‘Theatrical Players’
Nicole Forsgren and the Accelerate DORA team has just released the newest iteration of the State of DevOps report. The report investigates what practices make us better at delivering valuable software to our users as measured by business outcomes. Read on for our analysis of the report, and how it can be best put to use.
The latest drivers of software delivery performance
A major challenge of software development is that our work is by and large invisible. This makes our folklore essential in business matters. Some of our commonly used arguments and visualizations are digital urban legends rather than solid foundations for informed decisions. Here, we’ll go through a few examples and some measures to address our misconceptions.
How the stories we tell influence our decisions
When you embark on your cloud native journey there will be important choices to make about cloud providers, continuous deployment, environments’ setup and separation. This guide will help you make the right choices by sharing lessons learnt from running cloud native apps in production.
Kubernetes has become the de facto container orchestration platform. When we help clients of different sizes and domains start their cloud native journeys in Kubernetes, we assist them in making sound decisions and technology choices. There is no one-size-fits-all solution when it comes to choosing cloud providers, CI tools, continuous deployment pipelines etc., so it is important to make the right decisions at the start. Failing to do so can be very costly in terms of lost time and money.
How to make the right technical choices on your cloud native journey
Learn how Docker and Kubernetes work and the key benefits they bring. Using real demos, I show how Docker is a great packaging and distribution technology, and how Kubernetes provides a powerful runtime for containerized applications.
Watch this introduction to Docker and Kubernetes at the Trondheim Developer Conference (TDC)
In the world of Agile and DevOps we use many figures, charts and diagrams to argue and reason about our world and how we prioritize and make choices. However, at all levels of the organization, we misuse and misinterpret figures. It’s time to be explicit, measure the right things and act on them. Watch this talk from DevOpsDays Zurich in May 2019.
Watch this talk from DevOpsDays Zurich
Summer is a great time to catch up on reading, whether you’re at the beach, in a summer house, or cozy at home. If your book backlog is on the short side, don’t worry! We compiled a list of great books for summer reading.
Inspiration for your summer reading list
At Praqma we believe in knowledge sharing, and we love to teach our technical expertise. Watch this series of videos to learn how traefik reverse proxy works step by step.
A video seminar to learn how Traefik works
What testing steps should you include in your Continuous Delivery pipeline? Don’t just string together existing manual processes - use simple, collaborative tools to design something better!
A new card game to design Continuous Delivery pipelines
Hear about upcoming events in Scandinavia, latest tech blogs, and training in the field of Continuous Delivery and DevOps