Skip to content

Commit 7b59133

Browse files
committed
Convert to snippets
* As an Author * I want to convert the document to snippets * So that it is easier to maintain, iterate and add to in a structured manner.
1 parent 31876ea commit 7b59133

14 files changed

+959
-264
lines changed

‎.gitignore‎

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.idea/
2+
node_modules/
3+
package-lock.json

‎README.md‎

Lines changed: 622 additions & 264 deletions
Large diffs are not rendered by default.

‎build.sh‎

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/bin/bash
2+
3+
guide="README.md"
4+
5+
build_guide (){
6+
docker run --rm \
7+
--volume "$(pwd):/data" \
8+
--user $(id -u):$(id -g) \
9+
pandoc/core source/*.md -o ${guide}
10+
}
11+
12+
generate_toc (){
13+
docker run --rm \
14+
--volume "$(pwd)":/app \
15+
peterdavehello/npm-doctoc doctoc /app/${guide}
16+
}
17+
18+
main (){
19+
build_guide
20+
generate_toc
21+
}
22+
23+
24+
main "$@"

‎source/000_introduction.md‎

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# So you want to Onboard a DevOps Practitioner
2+
3+
Author: Martin Jackson - [@actionjack](https://twitter.com/actionjack)
4+
5+
## Introduction
6+
7+
Currently everyone seems to be very interested in recruiting DevOps practitioners but I feel the process of on-boarding them and giving them a supportive environment to be able to succeed and thrive is still a bit of hit and miss affair, especially in busy organisations.
8+
9+
Nobody (at least nobody I know…) *wants* to work in a difficult environment:
10+
11+
* Bad environments (and [broken cultures](https://julesx.com/toxic-work-culture-forcing-best-employees-quit/)) do not attract nor retain top talent it does the exact opposite.
12+
13+
> “Suffering increases in proportion to knowledge of a better way.”
14+
>> Jim Hickstein
15+

‎source/010_making_it_easy.md‎

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
## Making it easy to get work done from day one
2+
3+
<summary>Simplify, simplify and after that simplify some more</summary>
4+
5+
> “Everything should be made as simple as possible, but no simpler.”
6+
>> Albert Einstein
7+
8+
Reduce the time spent learning environments by building them to be easy to understand, with a focus on a making it possible for every developer (new or old) to become effective in the shortest possible amount of time.
9+
10+
Here is some guidance on how to make your environment easier to onboard and keep the people working on them happy.
11+

‎source/020_the_basics.md‎

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
## The Basics
2+
3+
<summary>The raw basics</summary>
4+
5+
> "Without a solid foundation of raw basics, any structure built upon it is liable to crumble and fall."
6+
>> Unknown
7+
8+
* Have internet access sorted out for new starts or let them know if there isn't any.
9+
* Locker access (if you supply lockers for hot-desk environments).
10+
* Let security know that they are coming.
11+
* Let people know if they are required to use their own equipment or are being supplied with specified equipment and what Operating System.
12+
* If you haven't already done so adopt some Group Chat software like [Slack](https://slack.com/), [Microsoft Teams](https://products.office.com/en-us/microsoft-teams/group-chat-software) or [Rocket Chat](https://rocket.chat/) this kind of software is beneficial to all and reduces pressure on key individuals because your questions go out to a group of people rather than target specific individuals who may be busy and under constant interruption.
13+
* If you do the above try and implement some [communications etiquette](https://hiverhq.com/blog/slack-etiquette/), for example when you answer someone create the answer in a thread so the questions, context, conversation and possibly solution are kept in the same place rather than being strewn throughout the chat history.
14+
* Provide a High-level Environment overview so new starts know what they are working on and what technologies they need to get up to speed on.
15+

‎source/030_culture.md‎

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
## Culture
2+
3+
<summary>Aim to create a culture of empathy and psychological safety </summary>
4+
5+
> “It's possible for good people, in perversely designed systems, to casually perpetrate acts of great harm on strangers, sometimes without ever realising it.”
6+
> > [Ben Goldacre](http://www.badscience.net/), [Bad Pharma](https://www.amazon.co.uk/dp/0865478007?tag=contindelive-20), p. xi
7+
8+
* Embrace the standard of [The Humble Learner](https://www.linkedin.com/pulse/myth-sufficiently-smart-engineer-aaron-blohowiak/), The Humble Learner accepts the limits of human capacity while seeking to grow their technical and empathetic skills
9+
* Do not create nor foster a [Blame, Shame and Train](https://www.ehstoday.com/safety/your-safety-strategy-blame-shame-and-train) culture where mistakes are handled by openly blaming and shaming the employee (and sometimes terminating their employment) and then train other employees using the incident as an example
10+
* Instead recognise each failure for what it is, a lesson, identify what went wrong and how we can ensure it does not go wrong again (and no, this does not mean this is an excuse to produce lots more documentation:stuck_out_tongue_winking_eye:)
11+
* Try to foster a culture of improvement, benchmark your organisation against some form of [maturity model](https://devopsadoptmeth.wordpress.com/method-description/devops-maturity-model/) to identify the gaps and attempt to close them.
12+
* Introduce the new engineer(s) to the relevant people within the organisation
13+
* Remember not everyone may be as smart as you are, they may be missing
14+
* Context / Situational awareness (how did we get from here to there?)
15+
* Tribal Knowledge (This is where our ancestors bodies are buried)
16+
* Cultural awareness (How we do things around here)
17+
*[Technical Expertise in that specific problem domain](https://team-manual.cloud.service.gov.uk/team/orientation/#avoid-assuming-expertise)
18+
* The local Taxonomy - concepts and language does vary from work place to work place. e.g. pre-approved changes and standard changes many not necessarily mean the same thing from job to job.
19+
* What are the Preferred practices or ["Design Principles"](https://www.gov.uk/design-principles)?
20+
* Listen to their point of view. Bringing in a new person is a prime opportunity to find out where the code or process needs improvement.
21+
* Test your mentoring and on boarding process to flush out any shortfalls by getting the last person who joined to mentor the new joiner.
22+
* Make your documentation inclusive e.g. this document is parsed using [alex](http://alexjs.com/) in order to catch insensitive and inconsiderate writing.
23+
* Be wary of not overloading new starts with too much information. There is often quite a lot to learn (often more than you think), instead provide a set of useful links so people can research at their own pace.
24+
* Write code that takes into account how future maintainers will feel reading it, let your code be [empathetic](https://www.benjaminjohnson.me/empathetic-code/).
25+

‎source/040_documentation.md‎

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
## Documentation
2+
3+
<summary>Make it easy to understand and do the things</summary>
4+
5+
> “Stale documentation is not only misleading, it is positively harmful.”
6+
>> Riona MacNamara (@rionam)
7+
8+
It's important to either have or do the following:
9+
10+
* Regularly tidy your documentation, old documents should be removed, outdated ones updated, if you touch it then update it
11+
* Consolidate your documentation, nothing is so disheartening as searching your Wiki for "Password Management Policy" and 40+ search results coming up :-1:
12+
* Have a High-Level logical Architecture. E.g. ideally written in a Git friendly format:
13+
*[SVG](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) diagrams in [github](https://github.com/blog/1902-svg-viewing-diffing) so you can see the infrastructure changes over time
14+
*[Graphviz description language](http://www.graphviz.org/content/dot-language)
15+
*[Graphvizo](http://gravizo.com/)
16+
* An overview of the company’s infrastructure.
17+
* Systems integration points and their third party dependencies
18+
* A intranet/wiki or enterprise social network to Learn about different teams, key members with pictures. On day one, one can easily get overwhelmed with lots of new names and faces.
19+
* Have documentation for your alerts. If something is important enough to disturb the on-call person about, it's important enough to have a runbook entry about it. If you alert because _foo queue is too long_, there should be a [runbook](https://web.archive.org/web/20191005043445/http://holyhandgrenade.org:80/blog/2011/08/runbooks-are-stupid-and-youre-doing-them-wrong/) entry describing how to fix it.
20+
* At one client I worked with we configured the monitoring system so the alerts themselves actually had a link to the relevant runbook entry :+1::clap:
21+
* Create a Glossary of Terms [e.g. a Minipedia] for describing any organisation specific acronyms or terms
22+
*[Create an on-boarding wiki page (i.e. Confluence/Google Docs)](https://wiki.mozilla.org/Devops/onboarding)
23+
*:+1: For Open,online and easy to reach [checklists](https://github.com/annahsgraves/onboarding-documents-1/tree/master/Checklists/team-based-checklists)
24+
* One cool thing that I have seen recently are [acronym decoder chatbots](https://wonderus.app/) for slack that watch for team acronyms and explain them real-time in the chat room
25+
* Write your documentation as if it's going to be [open](https://www.gov.uk/design-principles#tenth) to public scrutiny someday.
26+
* Have an easy to use and setup collection of shared resources e.g. bookmark file of URL links, .ssh/config files
27+
* If possible keep your documentation as close to the code as possible (possibly as [Markdown](https://www.markdownguide.org/)) rather than referencing external resources like wikis or, use a [static site generator](https://www.markdownguide.org/getting-started#documentation) this way you are more likely to have up to date documentation, since you get immediate feedback when you do a review of code changes rather than having to separately review a PR and a Wiki Page. Some options are:
28+
*[mkdocs](https://www.mkdocs.org/),
29+
*[hugo](https://gohugo.io/),
30+
*[sphinx](https://docs.readthedocs.io/en/stable/intro/getting-started-with-sphinx.html) or
31+
*[Jekyll](https://jekyllrb.com/)
32+
* If there are problems that you have to work around in your code then in the comments link to some sort of permanent record (e.g. a URL of a Jira story or [ADR](https://github.com/joelparkerhenderson/architecture_decision_record)) for why, the following code comment caused me to do a lot of running around (The `git blame' gave me a commit that lead to a PR that had zero details in it, authored by someone who could not remember why they put that in the code.):
33+
34+
```yaml
35+
instance_type: m4.4xlarge # Larger than this currently causes issues on our AMIs…
36+
```
37+
38+
* what would have been more helpful would have been:
39+
40+
```yaml
41+
instance_type: m4.4xlarge # Larger than this type causes issues see REF-2019
42+
```

‎source/050_Operations.md‎

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
## Operations
2+
3+
<summary>Make it easy to get stuff done</summary>
4+
5+
> [“Complexity exacts a staggering tax on your humans. Good Ops engineers attempt to pay down that tax.”](https://twitter.com/bridgetkromhout/status/647333814411358208)
6+
>> [Charity Majors](https://twitter.com/mipsytipsy)
7+
8+
* Have all relevant user accounts and access setup and ready
9+
* Create [Operations Checklists](http://atulgawande.com/book/the-checklist-manifesto/) for your key processes
10+
* Have your work structured so people can see what needs to be done i.e. Kanban board backlog or To Do lists
11+
* Provide information regarding the applications that are maintained by the team and how to do the operations for those applications
12+
* Have safe to deploy sample dummy applications that can be deployed safely to your infrastructure so new starts can learn how the deployment process works without fear of impacting key applications
13+
* Make it difficult to make mistakes e.g
14+
*[protected branches e.g. to prevent force pushes to master](https://github.com/blog/2051-protected-branches-and-required-status-checks)
15+
* If you have code standards, don't __just document them__ back them up with [Automated Code standards](https://medium.com/@biratkirat/step-4-automate-your-coding-standard-filip-van-laenen-5b1c486e4883) triggered by [CI checks](https://en.wikipedia.org/wiki/Continuous_integration) or [pre-commit hooks](https://githooks.com/)
16+
*[Avoiding committing secrets and credentials into git repositories](https://github.com/awslabs/git-secrets)
17+
* If you have Policies on how to handle certain tasks e.g. Doing Spikes document them and link to them in your stories. e.g. here's the link to how you handle spikes.
18+
* Ensure your naming conventions are consistent and make sense:
19+
* If something is called build_X and it actually deploys_X then change the name to deploys_X if possible to reduce confusion and prevent [information hiding](https://en.wikipedia.org/wiki/Information_hiding),
20+
* If your environment structure is env-productgroup-application then make sure the naming is consistent across all environments e.g.
21+
* Development-Acme-Bomb
22+
* Test-Acme-Bomb
23+
* PreProduction-Acme-Bomb
24+
* Production-Acme-Bomb
25+
* Nobody should be able to do something catastrophic to an environment unless they are determined on doing so i.e.
26+
* Make doing the right thing easy to do by creating safety harnesses using build or scripting tools like the following list to do the most common tasks safety without the worry of screwing up:
27+
*[Bash Scripts](http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_02_01.html)
28+
*[Gradle](https://gradle.org/)
29+
* If you use configuration management tools then use them repeatedly and/or test them, try to avoid one shot configuration management i.e. the operation is only run once once to configure a resource even one you do not expect to change, because it will change and it will break and you will be rushing around trying to figure out what happened.
30+
* Use the **Guard Rail Pattern** by putting safe conditionals in your configuration management to do be able to test runs without the worry of screwing up e.g. Ansible tasks:
31+
32+
```yaml
33+
- name: “Do something really Dangerous"
34+
command: /sbin/something —could —be —dangerous --if --run --it --in --prod
35+
when: testmode == “Off"
36+
```

‎source/060_Processes.md‎

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
## Processes
2+
3+
<summary>How should we be doing the stuff</summary>
4+
5+
> “If you can't describe what you are doing as a process, you don't know what you're doing.”
6+
>> W. Edwards Deming
7+
8+
* Everyone seems to have their own particular spin on Agile Scrum or Kanban, so explain up front what the process is and refine when and if necessary.
9+
* Have [Shovel Ready](https://en.wikipedia.org/wiki/Shovel_ready) work for new starters, create a backlog of work that can be easily done by a new starter:
10+
* Ideally [work that](https://www.visual-paradigm.com/scrum/write-user-story-smart-goals/):
11+
* is well defined,
12+
* is easily explained,
13+
* requires some research,
14+
* adds value and;
15+
* is __not__ grunt work e.g. documentation.
16+
* Assign your new starter an [on boarding buddy/mentor](https://hbr.org/2019/06/every-new-employee-needs-an-onboarding-buddy)
17+
* Ensure that this "Buddy" has enough free cycles to be there for the new start if needed
18+
*[Pair](https://www.agilealliance.org/glossary/pairing/) with new start as soon and as often as possible depending on the complexity of the environment this could go on for weeks (if not months), don't be afraid to pick up this pairing at a later date if the engineer has never touched that code block before.
19+
* When [and if] you do a Retro, then base it against a known good baseline i.e.
20+
* If you are doing production deploys in the early hours of the night and it goes successfully, remember this is not necessarily reflect a **good** deployment.
21+
* Put as much detail into tasks / stories as possible including:
22+
* Assumptions,
23+
* Reference information and existing implementations,
24+
* Ensuring to narrow down the acceptance criteria in order to prevent [unnecessary research or rework](https://idioms.thefreedictionary.com/go+down+the+rabbit+hole),
25+
* Diagrams.
26+
* Ideally make your [Tasks/Stories as small an atomic as possible](https://www.leadingagile.com/2014/01/small-stories-reduce-variability-velocity-improve-predictability/) this is for a number of reasons some of those being:
27+
* It makes them easier to handle and get your head around
28+
* You are less likely to have to [context switch](https://simpleprogrammer.com/context-switching/) within a story if it has a narrow [problem domain](https://en.wikipedia.org/wiki/Problem_domain)
29+
* You are more likely to actually finish that particular story and not have to pick up a new one and have to go back to the original story, since the smaller it is the less likely it is to run into some sort of unpredicted blockage.
30+
* Avoid [if possible] onboarding during crunch times (important or critical planned releases)
31+
* Ideally have your accounts linked with some central or shared directory e.g. Github/Google/LDAP so your new starters don’t have to create and remember 101 user/password combinations or have to request access to multiple applications separately.
32+
* Use configuration management that has a [dry run feature](https://en.wikipedia.org/wiki/Dry_run_(testing)) e.g. `--testing_mode on`
33+
* Blocking infrastructure tests or linters to catch mistakes early, e.g.
34+
*[Yamllint](https://github.com/adrienverge/yamllint)
35+
*[Test Infra](https://github.com/philpep/testinfra)
36+
*[Inspec](http://inspec.io/)
37+
*[Serverspec](http://serverspec.org/)
38+
*[Ansible --syntax-check](https://raymii.org/s/tutorials/Ansible_-_Playbook_Testing.html)
39+
*[cfg_nag](https://github.com/stelligent/cfn_nag)
40+
*[terratest](https://github.com/gruntwork-io/terratest)
41+
* Add or invite individual to any relevant [Slack](https://slack.com/), [IRC](https://en.wikipedia.org/wiki/Internet_Relay_Chat) or [Microsoft Teams](https://products.office.com/en-us/microsoft-teams/group-chat-software) channels or Mailing lists.
42+
* Provide information regarding relevant processes e.g.
43+
* Incident, problem and change management
44+
* Deploying changes / releases to the different environments
45+
* Ordering infrastructure / tools
46+
* Authorization for tools & applications
47+
* Use of test environments and creating and using testdata
48+
* Have [Clean code](https://blog.goyello.com/2013/01/21/top-9-principles-clean-code/) It really helps if your code is good, sensibly organized and well structured. If the code base is large, it should be broken down into smaller understandable segments
49+
* Create a [Papercuts.md](https://gist.github.com/actionjack/ee8408733b756fc101aa22488bb464a1) in your Repos, These are a log of things that have hurt us in the current environment, they may not be actual [technical debt](https://en.wikipedia.org/wiki/Technical_debt),however they could be things for us to discuss and possibly fix in the future.
50+
* If you have adopted a particular [coding style guideline](https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style) on your project then document or reference it for new joiners to easily reference and adopt
51+
*[Story kickoffs](https://elabor8.com.au/how-to-introduce-story-kickoffs-to-your-team/) can be extremely useful to new starters by helping them getting to the mindset of the team, identify areas that aren't immediately visible in the code base and generally reduce constant rework due to poor or missing acceptance criteria.
52+
* Embed you processes in your code. If your process requires you to hand off to another team to get the thing you want done e.g. After issuing a Pull Request you need to notify another team to run a Jenkins pipeline, then put the team and the contact information in the documentation (e.g. Slack Channel).
53+
* Use code formatters to standardize the structure your code e.g. `terraform fmt` this can make reading diffs a lot easier since you don't have to deal with things like differing indentation.
54+
* Encourage [Swarming](https://www.jrothman.com/mpd/project-management/2016/07/pairing-swarming-and-mobbing/) on difficult issues or development blockages (e.g. blocked pipelines) where the entire team works together on a single task

0 commit comments

Comments
(0)