Technology is evolving at a rapid pace. In order to keep up, we need to change how we as professionals interact with new methodologies and technologies that are released on a frequent basis. With this as a background, it seems crazy that the way a large chunk of work is being done in most places is through manual processes. Automation, and tools to assist with automation, are well established and have begun to change the face of how technology is used in more robust, scalable, reliable, reproducible, and verified environments. This is thanks in part to the principles of Devops being implemented in more organizations. In order to have more reliable, reproducible, and verified environments changes need to be made. This is my manifesto on personally how I will be approaching IT.
Technology enables companies to do amazing things, saving time and bringing efficiency to areas that used to be time sinks. With the advent of more open environments (public apis, scripting modules and hooks, documentation from vendors which is complete and through) the ability to automate processes is going from being somewhat of a pipe dream to a reality. It is now possible to automate the infrastructure allowing for not only the creation of new servers but also ensures that existing infrastructure is in compliance with configured, peer reviewed, documented standards. This relies on all other items being in place already, and working. This is the end goal of everything else – to allow for a seamless, documented, reliable and repeatable process which allows for the automation of documented manual processes. This also relies on manual processes being identified which:
The current process of pushing code from the editor to live without testing needs to change if any other goal can be obtained. This is the linchpin on which all other items rely on. Without a commitment to test (and peer review (4)) all code before any changes are made which touches a live production environment the rest of these items may be ignored. In order to truly transform the way things are run - using code to document and define infrastructure - multiple layers of testing must be done in order to ensure a consistent level of operation. This does not have to be phased in strictly at first, and may (in order to ease the transition) be something that is phased in slowly. This is not so strict that it is calling for only one method of testing available organization wide either. In order to fit into a corporate culture this requirement is by its very nature flexible. This call is for testing. Period. At first this may be something as small as testing against a test file server or server, but down the line this may evolve into unit testing as well as shipping the code through a build server to make sure everything is able to run without issues. One key component for this is that this cannot be done by one or two people on the team – EVERYONE involved with modifying and maintaining the infrastructure must be engaged in testing their own and other’s code.
Ultimately the gold standard is unit testing and everyone involved should be proficient enough in knowing how to write and run tests on code. This will take time, training, and buy-in in order to make sure everyone is on the same page and has roughly the same capabilities.
Testing is the linchpin to make this work, but learning will be the delivery mechanism by which all members are brought up to speed and empowered to be effective and enabled to perform the tasks which are being called for. We need to remove the silos of information that have traditionally built up between departments and allow for a more free sharing of information to occur. While there is a need for some information to be fully segregated (we would not want the server people to be able to see the hr data for who is being laid off soon) information regarding policies and procedures especially around tech and how to accomplish these tasks should be as open as possible. This includes making all documentation on wikis searchable and readable (excluding privileged information which should be kept to a minimum), regular learning events offered either during lunch or before/after work, collaboration and sharing of knowledge between departments, and acknowledgment and support of continuing education opportunities that are taken by team members. This education may start out as simple as powershell 101 to begin with to bring everyone up to speed and end up with advanced powershell 5 debugging techniques when it is deemed necessary. The goal of the education should not be to have one or two people who know how to work with, debug, and correct powershell code but instead to have ALL people who support these systems know enough to be proficient and work with code if a colleague is out as well as able to review the code of others and point out issues and errors with the code.
Code review is important. In order to redefine hardware as an infrastructure and maintain and update it through code, reviews by peers will need to be performed in order to ensure quality, reliability, best practices, and prevent collisions between multiple teams trying to implement code. Peer review is not testing; these are two independent activities. Peer review is to make sure that the code looks sane, doesn’t do anything malicious, and has all the appropriate tests written for it included with the checked in code. While it is important to test the code before it goes into production, before it is tested by someone else other than the author other team members should look over the code to make sure that:
There are no inefficiencies in the code that may cause performance issues if run against a production environment Appropriate error handling has been done Appropriate tests (unit and functional) have been written for every function in the code and are included in the repository No code included to work on other things outside of the intended process of the code This will increase the time between code being written and code going into production at first, but ultimately it will lead to cleaner code, better documentation, greater understanding all-around of what is changing, a more engaged/informed team on upcoming changes. Over time the time saved from “oops” mistakes will be reduced we will end up saving time.
Manually updating documents is a process which is rarely followed. Typically, a build document will be produced when the environment is first spun up, but it is rare to find documents or knowledge bases which are updated every time a change in the infrastructure is made. True the change should be documented through the change request, and this data is searchable (usually) through whatever application is used to assist with the change process – but this does not allow for one to quickly see what the current configuration of a piece of infrastructure SHOULD be. In order to see what the configuration currently should be you would need to take the base configuration document from when it was last updated and apply all the deltas from the change requests to cobble together what the current configuration should be.
Using code as documentation, however, resolves this issue. This is done in conjunction with DSC. DSC does not run just part of a script – instead it runs an entire configuration that is sent to it. This means that every time a change to a piece of infrastructure wants to be done when DSC is deployed, the entire file is updated and re-deployed to the server. As part of this process, the configuration script needs to go through the proper testing, peer review, and version controlling process setup to ensure reliable and consistent results. Now when a server configuration needs to be brought up for an audit, the most recently checked in and approved configuration script can be pulled from source control and used to prove that how the server is supposed to be configured is in fact configured this way. There is an investment in time upfront to get everyone on the same page as well define all these processes. The end result, however, is a document that not only makes sure your infrastructure stays within a certain configuration but is an always up to date build doc for a server.
The culture within IT is not one that allows for people to fail on a regular basis. We expect people to fail once in a great while, or fail only on large projects. But failure on a regular basis is something that we think of in a negative light, instead we should be looking at them exactly the opposite – provided they are trying to do something productive. If we are willing to fail, and to fail quickly, we are able to do a few things:
We fail before we impact any systems We learn something new We can take solace in the face that we were trying to improve our environment or resolve an issue Failure is not something that should be ashamed of, or something that we should be afraid of. Instead it should be embraced. Failure and persistence are the vehicles through which we learn critical skills outside of book knowledge.
Stagnation is the enemy of the IT world. Being confined to and performing the same tasks over and over again leads not only to boredom but missed opportunities to further grow. When presented with the opportunity to try something in a new way – either to reduce complexity, save time, or automate - we often shy away from doing it and rely on old trusted methods. This could mean that we are performing a task inefficiently – but because we are afraid of new things we continue to function in this reduced functionality mode. Instead what we need to is be willing to try a new method of approaching a problem we encounter on a normal basis. In order to do this though we have to be willing to not only fail, but interact with the community and build upon the ideas that others are having. Instead of being people who are siloed in their jobs, we need a more collaborative approach between coworkers, departments, companies (yes even companies) to share ideas on HOW to tackle problems. Armed with this we can experiment in a test environment. Fail. Lean. Repeat. And eventually succeed.
If employees are not encouraged to seek out new ways of resolving reoccurring issues, these issues will continue to pop up on a normal basis without full resolution. Instead, being able to tackle this issue from a new way may not surface a solution to the issue, but it potentially could inspire others to think of a new methodology for tackling the problem.
The days of the lone IT technician are over. In order to survive and thrive in the current IT environment means that not only must one be willing to work with and cooperate with coworkers but also actively engage with others in the various communities dedicated to various topics of interest. This allows for large scale collaboration, bouncing of ideas, and assistance when needed. Having a community of professionals who can assist each other regardless of company boundaries when issues that come up is vital for survival in this fast paced IT environment. This means that current IT personnel must not only be willing and wanting to learn, but willing to give back, communicate and share. It is through this network of professionals that new ideas can be born from, either through a discussion on a web forum or over a drink at lunch.
This seems counterintuitive at first. Automating oneself out of a job does not have the goal of taking away anyone’s job. Instead it has the goal of automating tasks which are taking up time and resources, allowing for those resources to instead be directed at new products, services, and non-automatable tasks. The goal here is to always be thinking of the next steps to take to ensure that what is currently being worked on and what is scheduled to be worked on in the future makes sense. The question to ask with any automation task is “does this free up a resource and/or time and what will the next steps be after this current step is done?” With time freed up from the automation of manual tasks, more things can be tackled at once. Those items that are automated can run in the background when they need to and staff can focus on the non automated tasks and projects at the same time. Items that are automated are done in a concise, reproducible, source controlled, peer reviewed method with code that has been properly source controlled, tested, and peer reviewed.
If the item being automated does not contribute to freeing up resources and/or time then the reason for automating it needs to be closely looked at as it is not fulfilling the call to automate one’s self out of a job.
Without buy in, all the training and culture of learning in the world is useless. Buy in ensures that all parties involved (both laterally and horizontally) are backing the way forward. Only if all parties are in agreement can any of this be effective. This will not be an immediate thing, as buy in will take time. Principal players need to be identified and brought on board as quickly as possible. Like most things in IT currently, this is a team effort requiring everyone to be on the same page in order to move the company and processes forward. It is essential that management be on board with any of these changes, especially if at first it causes issues with staffing due to more time being spent on getting an environment ready to handle the Devops workflow. None of these changes will occur overnight, and all appropriate parties need to have proper expectations set. Proper expectations mean that everyone involved realizes that not only will time be needed but resources as well in order to implement Devops in a current infrastructure. There will also be time and training required in order to get everyone up to speed on the new methodologies.