Test Environment Management Best Practices: Using Containers

Containers have been gaining popularity since their inception in 2001, particularly in the last few years. According to the official Red Hat blog post on the history of containers, they were originally created in order to run several servers on a single physical machine. There are significant advantages to using containers. You may have either the systems under test or automated tests running in containers—or both! This post describes best practices for managing a test environment that runs containers.

Define Ownership

It's important to define ownership of the container environment. If your test environment management (TEM) team is separate and has its own budget, most of the ownership will fall to them. Most of the guidance given by Docker regarding ownership of clusters in production applies equally to TEM.

Employing Containers

Use Containers for Tasks

The test environment itself may use containers for running tests on demand.

You can use containers to:

  1. Run applications/services.
  2. Perform tasks.

Tasks you can perform include smoke testing, load testing, and other types of automated tests. Since task containers are throwaways, you benefit from being able to free resources immediately after the task is run.

Use Clusters

Containers have scaled beyond the original intent of running multiple independent servers in relative isolation on a single machine. Clusters of host servers are common these days. You deploy containers to the cluster without having to directly manage servers for each application's requirements. Instead, you define the requirements for the container and the cluster management system runs the instance appropriately. Some noteworthy cluster management systems include Docker swarm and Kubernetes.

Cloud Hosting Services

Cloud services offer container hosting as a service, which is yet another level of abstraction from managing servers. These clusters are fully managed by the cloud provider. Results may vary based on your environment, but I've found that using cloud services to run tasks is beneficial, as it reduces the amount of scaling needed in a self-managed cluster. Also, hosting applications and services in self-managed clusters across your cloud VMs can lead to significant cost savings when running containers over longer periods of time.

Define Limits

Container memory, CPU, and swap usage limits should be set appropriately. If they are not set, the container instance will be allowed to use unlimited resources on the host server. This host server will kill processes to free up memory. Even the container host process can be killed; if that happens, all containers running on the host will stop.

Validate Configurations

Test for appropriate container runtime configurations. Use load testing in order to determine the appropriate limits for each application version and task definition. Setting the limit too low may cause application issues; setting it too high will waste resources; not setting it at all may result in catastrophic failure of the host.

Use a Source Control Management System

Container definitions are specified in a relatively straightforward text file. The host uses the text file to create and run the container instance. Often, container definitions will load and run scripts to perform more complex tasks rather than defining everything in the container definition (for Docker, that's the Dockerfile).

Use a source control management system such as Git to manage versions of container definitions. Using source control gives you a history of changes for reference and a built-in audit log. If a bug is discovered in production, the specific environment can be retrieved and rehydrated from source control. Because you can quickly recall any version of the environment, there is no need to keep a version active when it's not under test.

Create a New Container Per Version

It's best to create a new container for each version of each application. Containers are easy to run, stop, and decommission. Deploy a new container instance rather than updating in place. By deploying a new instance, you can ensure that the container has only the dependencies specific to that version of the application. This frees the running instances from bloat and conflicts.

Running instances have names for reference; use the application and version in the instance name, if possible.

If dependencies haven't changed, the container definition itself doesn't need to change. The container definition is for specifying the container itself (the OS and application dependencies). Specify versions of dependencies and base images rather than always using the latest ones. When dependencies change (including versions), create a new version of your container definition or script.

To clarify, here is a snippet from a Dockerfile for node 10.1.0 based on Linux Alpine 3.7:

FROM alpine:3.7

ENV NODE_VERSION 10.1.0

...

    && curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION.tar.xz" \

    && curl -SLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \

    && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \

    && grep " node-v$NODE_VERSION.tar.xz\$" SHASUMS256.txt | sha256sum -c - \

...

CMD [ "node" ]

If you were running node scripts, you might create your own Dockerfile starting with FROM node:10.1.0-alpine. This tells docker to use this specific base image (node 10.1.0 running on Linux Alpine) from the public image repository—Docker Hub. You would then use the remainder of your Dockerfile to install your application specific dependencies. This process is further described here.

Avoid Duplication

There should be a single source of truth for each container definition. All deployments in all environments should use that source of truth. Use environment variables to configure the containers per environment.

Design container definitions for reuse. If you find that only certain parts of a definition change, create a base file for the parts that stay stable and move the ever-changing parts into child files.

Monitor

Monitor your running container instances and the cluster environment. Monitoring allows you to flag events that could indicate defects in the system under test. These defects may go unnoticed without a way to measure the impact of the system on the environment.

When working with clusters, monitoring is essential for auto-scaling based on configurable thresholds. Similarly, you should set thresholds in your monitoring system to trigger alerts when a process is consuming more resources than expected. You can use these alerts to help identify defects.

Log Events

Logging events from your test environment can mean the difference between resolving an issue and letting it pass as a "ghost in the machine." Parallel and asynchronous programming are essential for boosting performance and reducing load, but they can cause timing issues that lead to odd defects that are not easily reproducible. Detailed event logs can give significant clues that will help you recognize the source of an issue. This is only one of many cases where logging is important.

Logs realize their value when they are accessed and utilized. Some logs will never make it to the surface, and that's OK. Having the proper tools to analyze the logs will make the data more valuable. Good log analysis tools make short work of correlating events and pay for themselves in time.

Use Alerting/Issue Tracking Strategically

Set up alerts for significant events only. A continual flood of alerts is almost guaranteed to be less effective. Raise the alarms when there is a system failure or a blocker. Batching alerts of lower priority is more efficient, as it causes less disruption to the value stream. Only stop the line when an event disrupts the flow. Checkpoints like gates and retrospectives are in place for a reason. Use them, along with issue tracking systems, to communicate non-critical issues.

Summary

Containers are being used nearly everywhere. They're continuing to gain traction, especially as cloud hosting providers are expanding their container hosting capabilities. Understanding how to manage containers and their hosting environment is important for your test environment management capabilities. You should now have a better idea of what to expect when using containers and how you can most effectively manage your container environment.

Author: Phil Vuollet

Phil uses software to automate process to improve efficiency and repeat-ability. He writes about topics relevant to technology and business, occasionally gives talks on the same topics, and is a family man who enjoys playing soccer and board games with his children.

Orchestrate your Environments

Enterprise IT Intelligence & Orchestrating your Environments

What Is Enterprise IT Intelligence?

I have heard the phrase “Enterprise IT Intelligence” banded around a few times recently, in particular when talking about concepts like Agile, DevOps at scale or IT Environment Management.

It's quite a fancy set of words. However, what does it mean, though?

Is it an oxymoron? Is it simply someone trying to create a new job title for themselves? 

> "Ahem, yes. I have decided I am an enterprise intelligence analyst." 

To be honest.

I'm already suspicious of anyone who calls themselves "data scientist" , and I'm all the more so when it comes to the title above.

To put it in "Up Goer Five" terms (read: using only the 10,000 words people use most often), enterprise IT intelligence is the idea that seeing and learning things across an entire work place can help people in power do new and good things.

This usually in turns leads to more value, better revenue, happier employees, or whatever else that decision-maker is targeting. And of course, we threw in the term "IT" because most of this data comes from employee and customer interaction with software applications. 

Data vs. Information vs. Intelligence

Software applications are the main producers of this potential intelligence. But let's be clear: they usually only produce information and data, not actual intelligence. This is why enterprise IT intelligence gets its own moniker. It's not common. The way we spin up application teams isn't geared towards producing intelligence.

So, what's the difference between data, information, and intelligence? They're easy to mix up, and confusing them can lead to a lot of money spent collecting these three things with little gain.

Data means facts. If we're talking about human beings, we can consider our height, weight, and age to be data. Information means we're using snapshots of these facts to answer simple questions. In our example, that means taking our data to project what our current risk of a heart attack is. Intelligence, the glorious topic of this post, is the story that this information tells. "Why do people with a high risk of heart attack love popcorn?" Intelligence often leads to decisions like, "Let's partner our movie theaters with the American Red Cross. We'll offer a discount on our weekend showings if they do a health evaluation."

User-Focus Doesn't Always Lead to Intelligence

As I said, most applications produce data and information, not intelligence. Why is this?

Well, it's because applications in an enterprise are usually either user-centric or data-centric. A data-centric application may be something like a service that stores recommendations for loans to which people may apply. User-centric applications might be something like a tool to help mortgage lenders approve a new mortgage. In organizationally-challenged enterprises, a team may be defined as the database management part of seven teams that help mortgage lenders approve a new mortgage.

It's healthy to focus on the main users of your application, and it will lead to much success for you and them. But what the user does or needs in order to interact with you is just a small part of the story. Think about a mortgage enterprise. A user applying for and receiving a loan are just two steps of the overall value. The loan has to be underwritten, someone has to check if there's money in the bank for the loan, and the risk of the loan must be audited. I'm not a mortgage banker, so there are probably dozens of steps I'm missing. No one user-focused application team can handle all of this at any decent-sized company.

Value Streaming

Instead, intelligence has to be focused on the value stream of the enterprise. This is a fancy way of saying we should focus on how the company goes from wanting to fulfill a customer desire to creating a fully formed application: how things go from a twinkle in the eye to reality. So if I'm an automotive company, I'll map all the steps of a new car design being turned into a car that the customer drives off the lot.

Since I can see the entire stream, like water, I can shift it. I can put a metaphorical rock in the right place to steer the stream where it needs to go. No one user-centric or data-centric application can do this on their own, but they all have a little bit of data and information they can give to the effort, creating intelligence.

What Is Its Power?

Gaining actual intelligence over multiple aspects of your enterprise is powerful. I mean, this is the stuff that drives companies to the top—and the lack thereof can drive a company out of business. Netflix has spun up entire genres of original content based upon the intelligence they gathered about what people like to watch. Amazon uses intelligence to groom their product recommendation engine to outsell their competition. Until you use intelligence across your enterprise, you're only seeing a small part of the story.

Bad analogy time: think about a musical composition. You have notes, the beat, the scale and the volume. This your data. You have the transitions between notes: sometimes you go up smoothly, sometimes you go down in staccato fashion, sometimes you pause. This is your information. Then you bring in the instruments. You have your strings, brass, and percussion. There may also be singers. You bring in a group of altos, tenors, and basses (sorry, sopranos). They each get their own set of notes. These groups are your applications.

However, the true beauty of the music cannot be understood until you hear all of it come together. You can get an idea of how the song may sound with just one section, but you won't truly appreciate it until everyone works together. That is intelligence. And the conductor takes these elements and guides them, raising or lowering the volume, slowing down a section or speeding them up. This is why intelligence is valuable. And only the conductor, the one who has this intelligence at their disposal, can do it.

Where To Go From Here

So let's make some beautiful music. We live in a great age where there are many, many tools that can help you gather intelligence for your enterprise. In fact, there are enough tools that it may overwhelm you if you're dipping into the space for the first time. It may be helpful for you to look into tools that simplify the process.

I hope this helped clear up the nature of the intelligence behind "enterprise IT intelligence."

Author Mark Henke

Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function. Every developer is a leader of something on their team, and he wants to help them see that.

 

Data Environments Evil Twin

Data! Environments Evil Twin – Top Breaches 2017

Preamble

Data is without doubt the evil twin when it comes to managing your Environments. It is just so complicated both internally, within an individual database and at a more holistic level i.e. where data and the inherent relationships span the organization.

A complexity that ultimately exposes the organization to all kinds of challenges, one of the main ones being the likelihood unwanted information (for example Customer Personally Identifiable Information) will appear (or leak into) the wrong places for example your Development, Integration and Test Environments* or worse still onto the public internet. A sub-optimal situation when one considers 70% (Source Gartner) of breaches are committed internally.

Tip*: Don’t ignore Non-Production, as that’s where projects spend 95% of their time.

Anyhow here's a post from TJ Simmons on some of the top breaches from last year.

Top 5 Data Breaches of 2017

Back in September 2017, news broke about a group called OurMine hacking Vevo. OurMine got away with 3.12TB of Vevo's data and posted it online. That's when I came across this comment on Gizmodo:

Gizmo Data Hacking

That comment says it all. Every other day, it seems like another company is on the receiving end of a data breach. Most security experts will tell you that the data breaches we know of so far are merely the tip of the iceberg. One expert noted in 2015 that we should expect larger hacks to keep happening. And larger hacks have kept happening, leading to bigger and messier data breaches.

The following is a compilation of the worst data breaches of 2017. Usually, companies discover data breaches long after the actual breach impact. Therefore, I will organize these breaches in chronological order based on when the data breach was made known, not based on the date the breach actually happened. Just like how I selected the top IT outages of 2017, I selected the following five breaches based on the impact to the customers and to the affected business' reputations.

  1. Xbox 360 ISO and PSP ISO (Reported: Feb. 1, 2017 | Breach Happened: ~2015)

The forum websites Xbox 360 ISO and PSP ISO host illegal video game download files. They also house sensitive user information such as email IDs and hashed passwords. According to HaveIBeenPwned, a website that helps users check if their personal data has been compromised, Xbox 360 ISO and PSP ISO had a combined 2.5 million compromised user accounts. The attacks happened way back in September 2015 and no one discovered them until February 2017. The compromised information consisted of stolen email IDs, IP addresses of the users, and salted MD5 password hashes.

The biggest takeaway for consumers: avoid shady websites like Xbox 360 ISO. Trusting websites that host illegal material with your personal information is dangerous. On the plus side, at least both websites hashed their users' passwords. If your website is holding onto users' passwords, please implement the most basic of security measures by hashing those passwords.

  1. Deloitte (Reported: September 2017 | Breach Happened: March 2017)

In the same year that Gartner named Deloitte "number one in security consulting for the fifth consecutive year," Deloitte faced an embarrassing data breach. Deloitte, a multinational professional services firm, saw its clients' confidential emails exposed when a hacker gained unauthorized access to its company email system. The affected clients included blue-chip companies, well-known firms, and US government departments. What caused the hack? Apparently, the main administrator account required a single password and Deloitte had not instituted two-step verification for that account.

The Deloitte hack is a great example of how even the most security-conscious firms can make security missteps. We need to learn from Deloitte by identifying and eliminating all possible loopholes in our own companies' IT setups, the sooner the better.

  1. Equifax (Reported: September 2017 | Breach Happened: July 2017)

If you asked a regular Joe if they can recall any major data breach from 2017, chances are they will cite this one. Equifax, one of the top three US credit agencies, suffered a breach affecting nearly 143 million consumers. Given the sensitivity of the stolen data and the number of people affected, this breach has been considered "one of the worst data breaches ever."  The stolen data included Social Security numbers, driver’s license numbers, full names, addresses, dates of birth, and credit card numbers.

In response to the breach, Equifax set up a website called equifaxsecurity2017.com to which it directed consumers who wanted to know if their data had been stolen. Some users reported that the website did not work. And many were angry to find out that in order to use the website, they would have to agree to an arbitration clause stating that they would waive their rights to a class-action lawsuit. Some users even tried entering fake names and fake Social Security numbers, and the website's response—informing them they "may be affected by the breach"—increased skepticism about the website's validity.

The bigger your organization, the more information you have. If the information you have is sensitive, you will become a bigger target for data breaches. Equifax holds a great volume of highly sensitive information. This should lead to a corresponding increase in security measures, but clearly there is a gap between what should be done and reality. Learn from Equifax and be honest when assessing your existing security measures. Is there a gap between where they are and where they ought to be?

  1. Yahoo! Update (Reported in 2016 | Updated on October 9, 2017 | Breach happened: ~2013)

Yahoo! had already reported this breach back in December 2016, revealing that "one billion user accounts were compromised" in a 2013 hack. Turns out they underestimated the impact of the breach in the original report. Then-CEO Marissa Meyer subsequently told Congress that the 2013 data breach had affected all three billion of their user accounts. In other words, every single Yahoo! user account from popular services such as email, Tumblr, Fantasy, and Flickr suffered from the breach. After extensive investigations, the culprits' identities are still unknown.

Yahoo! is a classic case of a company with so many interdependent services that the complexity gives hackers opportunities to exploit. Notice how Yahoo! is still unable to identify the culprit? That speaks volumes to the challenges facing companies with a wide range of systems. You cannot plug the loopholes that you don't know exist. In other words, you need to start by knowing exactly how your internal IT systems work together.

  1. Uber (Self-Reported: November 21, 2017 | Breach happened: ~2016)

On November 21, Uber CEO Dara Khosrowshahi published a post revealing that he had become aware of a late-2016 incident in which "two individuals outside the company had inappropriately accessed user data stored on a third-party cloud-based service" that Uber uses. This breach affected 57 million users and further damaged Uber's already faltering brand. The stolen information included names, email addresses, and phone numbers.  Approximately 600,000 of the 57 million affected users were drivers who had their license numbers accessed by hackers.

What made things worse was that Uber paid the hackers 100,000 dollars to destroy the stolen data. As a result, Uber got even more bad press. Uber is an ambitious startup chasing high growth rates that emphasizes scale and agility above everything else. But in hindsight, they clearly needed to emphasize security as well. With consumers becoming more data-aware, security-conscious companies can gain an edge on their competitors.

Conclusion: Manage Data Security? Get Clarity on Your Company's Systems First

Given the complexity of enterprise IT systems, hackers can now find more loopholes to get past your company's security measures. Therefore, having a clear big picture on how your systems work together is a security priority. We've seen how even the biggest and most security-conscious firms (remember Deloitte?) can fall prey to data breaches, precisely because their complexity makes it much harder for them to identify and prevent hacks. With that in mind, consider an enterprise-level dashboard that can show you that big picture vision that will help both your security and your productivity.

As Peter Drucker once said about executive effectiveness, "you can't manage what you can't measure." Most organizations will have some way to measure security readiness. But do they have a way to measure and make visible how their systems work together? Knowing how your systems work together makes you better prepared to identifying root causes for potential hacks. If you can plug the gaps before they can be exploited, you can reach zero outage and zero breaches.

Author TJ Simmons

TJ Simmons started his own developer firm five years ago, building solutions for professionals in telecoms and the finance industry who were overwhelmed by too many Excel spreadsheets. He’s now proficient with the automation of document generation and data extraction from varied sources.