Rawkode Academy | Complete Guide to Komodor

David Flanagan. Rawkode Academy | Complete Guide to Komodor
David Flanagan
Founder at Rawkode

hello and welcome back to the Rawkode

Academy

I am your host David Flanagan although

you may know me from across the internet

at Rock code today I’m going to guide

you through commodore

if you’re not familiar with commodore

it is a SAS based product to help you

troubleshoot and debug your kubernetes

clusters

something

I have a few opinions about

and I don’t often cover SAS based

products however Commodore just this

week

announced their new free tier meaning

you don’t need to pay to get started

with commodore

not only that Commodore sponsored my

time to produce the advanced scheduling

demo and lastly

they’ve also committed to being on an

episode of clustered where I’m going to

put them through the tests I’m going to

give them a broken cluster and they’re

convinced they can use Commodore to fix

it

so thank you Commodore for sponsoring

the advanced scheduling video

I’m sorry and thank you for joining me

on cluster very very soon

but today let’s focus on the tutorial

I’ll show you how you can get started

with commodore

we’re going to start from the beginning

but also showcase some Advanced use

cases for commodore

but the first thing we need to do is go

to commodore.com

from here you can feel free to read the

marketing material go to resources

pricing documentation whatever you want

I’m going to start by logging in

which I use my Google account

so immediately we’re presented with the

service list this is a list of all of

the microservices or maybe huge Services

who knows deployed to your kubernetes

cluster

now you won’t see this list right away

this is because I’ve already added my

first kubernetes cluster but let me walk

you through the process for doing that

yourself

down at the bottom left you will see the

Integrations button

when you select this you can click on

add a cluster

you can give it whatever name you want

and hit next

this will give you the command that you

copy and run in your terminal it will

add a Helm Repository

deploy the helm chart with a Commodore

agent

and then you click next where it will

wait for the connection and confirm it

from there go back to the home page and

you’ll see your kubernetes services from

your cluster

nope there’s a couple of nice things on

this page right off the bat first

all my services are healthy hey

but secondly

we get a good overview of the workloads

running in this cluster

you can see I have a bunch of Prometheus

stuff

I’ve got one password connect with

githubs cert manager shop I

lots of cool stuff

now if things weren’t all healthy we

could either exclude the health use

or we could filter on their health feeds

if you have more than one cluster

you can filter by that too

and if you only want to take a look at

particular namespace

in my case let’s just take a look at my

community namespace

you’ll see that I’m only running a

single service

if I want to view the platform and the

community namespace I can do so as well

if you want to filter by workload type

we can click on demon set and see demon

sets just the basic sentence that you

would expect from a service overview of

your cluster

the last thing I’ll point out on this

page is at the top right

here we can sort by a few options by

default is on health which makes sense

if there’s something that is unhealthy

in your cluster you want to see that

first the other viewers that I’ve been

enjoying over the last few days is

namespace

it’s a good way to break it down by

namespace without specifically filtering

on a name space itself

and if you’re only worried about things

that have changed recently go to last

modified and you’ll see the most recent

resources that have been modified within

your cluster

I deployed previous today so we can see

Prometheus front and center

and that’s the service overview it’s not

life-changing

but it’s very valuable

with just enough functionality

to maybe pry Cube control over your

hands when things go wrong

so let’s see what else we can do with

commodore

so we also have the jobs option on the

left although I have no jobs in my

cluster however this is just the same as

Services if you are using the job object

or the crown job object you will see

them listed here

next we have the events this will show

you all the events from your kubernetes

cluster now this is something that can

be typically quite overwhelming to do

from the cube control command line

because events come fast and furious in

a kubernetes cluster

and when we have an abundance of

information

bring in a visual layer

to that information as how we develop

understanding

so let’s see how we can understand the

events within a kubernetes cluster with

commodore

much like the service page we have the

ability to filter these events on

cluster and namespace

however now we can filter by individual

service

we can filter by the event type

we have the ability to filter on the

status of the event as well as deploy

details and availability reasons

and we’ll get into more of these in just

a moment

but first let’s take a look at my

platform namespace

now here we can see all the events as

Commodore was deployed to my cluster and

went through the discovery fields that

is discovering all the workloads and

resources within my cluster from here we

can click on the service name

so that slides out at a nice kind of

popover model dialogue meaning we don’t

really lose our original context when

we’re debugging which I think is really

important for a debugging tool so very

nice addition

we have the service name the health

status you can see all the events for

the service as well as the pods

the nodes are scheduled on and some

additional information which gives us

access to the labels and annotations on

the service

okay let’s pop into the monitor

namespace

and we’ll select our grafana service

currently we only see information about

grafana which we’d expect

we can see the events again the nodes

pods

and our labels and annotations

now before we take a look at the best

practice recommendations let’s pop back

over to events and see here we have the

related resources button

now this is quite nice because it allows

us to select other resources within the

same namespace

if we want to be able to collectively

grip them and view their events together

so I’ll pick on kubernetes services and

I’ll mark this as related

I’ll pop over to contact Maps where I’ll

select the API server one and I’ll pick

one more which is to pop over to secret

and secret finder config

we apply the selection and there you’ll

see that the events listed for on this

resource include the related resources

now I think this feature could be

improved I’d love to see Commodore scan

the yaml for reference config Maps

secrets and services with matching

selectors

and hook this up for me however doing it

manually if there’s a few resources that

I do want to group collectively isn’t

exactly at the end of the world so it’s

a cool feature and one that could have

some really interesting improvements

over time

so let’s go back to the information

screen and we’ll see these best practice

warnings

so when we click this we have a bunch of

checks and here we can see that our

deployment has one replica now this is a

warning just because if we lose that

replica we’ve lost our service so maybe

you want to run two or three however you

know your services better than any tool

can so feel free to use the ignore

button

for grafana maybe we determined that we

do only ever want one and we’re happy

for that to be offline if something goes

wrong

we can just say ignore for 14 days 30

days 90 days or forever

so perhaps I’m not ready to make a

decision on whether this is good or bad

yet and I’ll ignore it for a couple of

weeks

next we have a critical warning telling

us that this workload has no liveness

probe

if we expand it it tells us

that life has probes are sustained to

ensure that an application stays in a

healthy State when a liveness probe

fails the Pod will be restarted

this is a pretty neat Behavior the

cubelet monitors our workloads and if it

needs to kick them it kicks them

so you should always try and have some

of these best practices whenever

possible and commodore brings that front

and center

so I’m not going to ignore that one

because you know what I should have a

lightness program that’s workload

now we’ve got some past ones here where

we have a Readiness probe we have CPU

and memory constraints

and the last one is just a pill policy

it’s not good practice to have an

average pill policy of always

usually preferred to set it to F not

present it just means when the workload

restarts you don’t need to go to an

image registry and see if it can be

pulled down and it usually means you’re

using some sort of Alias tag system

again we want to kind of get away from

that as much as possible

so it’s not a critical but it is a

warning that you maybe you need to

update this

and I think this is a nice way to gain

more insights and understanding of the

services within our cluster

so let’s see what else we can do from

the events page

so I’m not going to filter on an

individual service I don’t think that

shows as NF in you but if we scroll down

we can filter on the event type

so let’s filter by one of these event

types and see what information we get

back

let’s start with one of the most common

ones which is conflict change

this is going to tell you when a

conflict map is created modified or

deleted within your cluster

so

let’s create one

here I have cm.yaml which is a config

map called raw code

if we go to the terminal

we can apply this to our Monitor and

namespace

let’s make a quick change so that’s

conflict map and say that we no longer

want key value instead we want

name

David

go to our terminal

apply this one more time

and let’s go visualize this with

commodore

so right away we can see

that a config map was created and the

monitor namespace called Rockwood

we click on this we have all green

because it was the first time this

conflict map was created

we then have our change and this time we

can see that the key value was removed

and named David was added

and if you want to view this in more

details you can expand the death

we can see the data changed

along with some metadata about the

resource as well

and this is one of these really simple

but very valuable features when things

go wrong on a kubernetes cluster is not

because the resources haven’t changed

it’s because of our changes the things

sometimes go wrong human error is

probably still the biggest cause of

problems in a kubernetes cluster

so it’s crucial

that you understand when conflict

changes in your cluster and how that can

have a cascading effect on the workloads

within your cluster

and your ability to see those changes as

they happen

will substantially lower your mean time

to recovery

so beyond conflict change

let’s filter on availability issues

no availability issues give us an

understanding of when a workload was

unavailable

perhaps because the part was being

restarted or the probes were failing

if we take a look at the grafana one

here

you can see that this pod was unhealthy

and why it was unhealthy well because it

was container creating of course it’s

not healthy if it’s discreeting

also what’s nice here is it shows you

each of the containers and the status

for them too

if you want you can click the live pods

and logs button

this will show us the current pod and

our cluster for that selector

where we could pop it open and go to

logs so it’s nice having a logs right

front and center when required

if we pop back to details we can see the

conditions

that tells of our pod is healthy

we have the containers the images are

running the pill policy the ports the

mounts and the arguments all the useful

information that you need

we have the abilities to see the

tolerations

the volumes

and of course the events associated with

this workload

if you’re a fan of the cube control

describe command you can click the

scrape and get the exact output on the

screen as so

so to reiterate from the advanced page

we’ve seen a pod web availability issues

we went to the current instance of this

pod we’ve seen there is no problem and

we had all the information we need to

debug a problem

to debug if there was something wrong

now the rest of the Commodore UI is

pretty much more of the same

we can break down all of the resources

we can see nodes we can click on a node

we can see all of its conditions the

capacity

and allocatable resources across CPU

memory storage Etc

we have the ability to Corden and drain

a node if we wish

for workloads it’s the same we have

deployment so we can click we can add it

we can scale we can restart

you can do this for most of the

resources within your cluster

for storage we can see storage classes

or config we go to config Maps

and we can see them

pretty much you’re getting a visual

representation of everything you can do

with the cube control command

we can even list the custom resource

definitions within our cluster

and search

so I’m not going to spend any more time

going through this

because these web pages are dashboards

and as we all know dashboards are not to

be looked at until something goes wrong

so how do we get information from

Commodore to give us a layers when our

attention is needed

and for that we have monitors

we can expand our cluster

and we can see that we have some rules

already in place these are shipped by

default by Commodore we have an

availability monitor this will let us

know if any of our workloads are less

than 80 for more than 10 seconds if we

need 10 pods in our deployment

and for more than 10 seconds we have 7

or less we’ll get an alert

if our Crown jobs are failing we get

another

we can get alerts for when deployments

are updated

and we can also get alerts for when our

nodes are not healthy

so let’s take a look at one of these

alerts and then configure our own

here is the deployment earlier this is

going to let us know whenever a workload

is modified within our cluster

using the Integrations that you have

configured with Commodore you can use

these as destinations

we can use a standard webpack or publish

a message to Slack

I have a channel called SRE

and I’m going to click save

so now if we modify a deployment we

should get a notification to my slack

Channel

so let’s test it

I have my slack here but first we need a

deployment change

so I’m just going to do this through the

Commodore UI

I’m going to go to deployments

and we’ll modify the sert manager

deployment

we can click on edit yaml

where I’m just going to add a new label

we hit apply

we can see that we have new events

and we can see that our manual action to

edit a deployment

we click on it we see the change so even

though we can see the deployment changed

here this will not trigger a slack

notification

because it uses the resources generation

rather than a resource version

which is good because

that we really need a notification that

a label changed on a workload when the

workload itself was not restarted

rescheduled or modified

now let’s make one more change to a

resource

this time we’re going to add an

environment variable

which is going to have a value

of high

now this will trigger

a new generation

and if we pop over to Slack

we’ll see the notification and the SRE

Channel

and if we click on this it takes this

directly to the event with the change

we can see that the revision and

generation of this resource went from

one to two because of an environment

variable addition

let’s workload

change is also denoted here by the new

deploy event which tells us that the

image doesn’t change but other aspects

did so give it that we have a pretty

sophisticated troubleshooting and

debugging tool here

it’s also worth noting that Commodore

has

pretty elevated privileged access to

your cluster

and as such we need to be able to trust

it

luckily Commodore pervade the ability to

protect some of our more sensitive

information from being leaked through

the Commodore UI

let’s go to resources workloads and pods

if I select the default namespace I have

a new super secret workload

if we click on this there’s not a lot to

see here but if we go to the logs we

have

a password

how do we prevent Commodore from leaking

such information no it doesn’t have to

be just an or standard out logging

although when applications crash

sometimes they do dump the environment

revealing some very sensitive

information

but also how do we redact this from

conflict maps and other sources of

sensitive information fortunately by

default Commodore hashes all the

information that it pulls from Secret

resources

but we do need to put in a couple of

extra steps to protect standard out and

or config Maps although I hope you’re

not storing too much sensitive

information in the config map

and I’m going to configure this through

the Commodore UI

so the first thing we want to do is to

go to configuration and config Maps

where I can select the Commodore

namespace here we have the kubernetes

Watcher config

and I’m going to go straight to the edit

page

you’ll see we have two settings here on

lines 20 and 21 called redact and redact

logs

these take a list of Expressions to

redact from kubernetes resources and

from log data

now the six steps regex pattern matching

like you can do with any sophisticated

login Library

but I’m going to keep it very simple for

this demo

I’m going to explicitly say that I want

my password one two three omitted

and we’ll add one more

this time we’ll do a reject match for

anything we’ll click password equals

let me open the matcher to dot star

and we’ll stick a space on the end

so now we will have to kick the

Commodore agent

so that reloads its configuration

we can just delete

and already we can see we have a new one

running

so let’s go back to our pod and the

default namespace

where we have our super secret secret

workload

and if we view the logs our password one

two three has been redacted

so let’s modify this

at the deployment level

or we can say edit yaml

and we have password one two three but

let’s also add password equals

hello

not secret like so

what caused this pod which we can see

here to be terminated with a new one

running

and if we pop open the logs for here

you can see that both values have been

properly redacted

so this is a very important feature

but also a very cumbersome feature

because security is hard it’s never easy

it would probably be worthwhile for your

team or organization to have convention

to well not log sensitive values but if

you do always make sure there’s some

marker in place so you can configure

tools like Commodore and other logging

systems to redact that information as

fast as possible

and you can even run the container

locally to test your redactions before

pushing them to your kubernetes cluster

if I go to my terminal I have a just

fail

it’s like a MIG fail however it allows

positional arguments on targets and it’s

generally just a little bit nicer to

work with

from here we can say redact and you can

already see the autocomplete here and

the documentation from the just file but

we provide a redaction phrase and a log

Lane

so I’m going to say let’s redact raw

code

and then the log Lane that I want to

test is to say I and raw code hear me

type

just pulls down a container image does a

little bit of plumbing and then shows

you the input log and the output and you

can see what it was before and after the

reduction

this means that you can test your regex

patterns all you want

say you want to do password equals

dot star

question mark

raw code because maybe we got it wrong

and then in our test

with the password

equals

La Raw code without any

so this won’t redact but we have a

problem we can fix it

so let’s run that again with the E on

the end

and oh is still broken

well clearly I don’t know how to spell

Rockwood

there we go where’d you do it right it

works that string was redacted because

we were able to test the regex

now this is really cool you can actually

Plumb in a shell script a whole bunch of

example log lanes that you have from

your application but redactions that you

know always have to be satisfied and

hook this into your CI system and that

way you know right away whenever you’ve

got Secrets leaking that should be

redacted

so that’s a quick overview of commodore

there’s an awful lot to love and it

gives you great visibility into the

wonderfully complex system that is

kubernetes running our wonderfully

complex applications which are our

microservices

this was just part one there’s going to

be a part two of this video dropping

early next week and part two we’ll be

taking a look at more of the Commodore

Integrations

we were seeing how to integrate your

Source control via GitHub we’ll also

take a look at hooking up to Sentry for

exception tracking

and grafana an alert manager giving you

full visibility across all of your

observability stack

and then at the end of next week part

three will drop where we take a look at

two final features one at the humble web

hook and how we can get information from

Commodore to do whatever the hell we

please

and then one of my favorite features

the v-cluster integration

deploying Commodore to all your virtual

clusters

and multi-tenant kubernetes environments

we’ll be back next week with the next

video Until then have a wonderful day

and I’ll see you all soon