Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
Webinars
You can also view the full presentation deck here.
*This is an auto-generated transcript
1:45
enemy and we have built a multi-cluster on-prem platform that we call Jupiter
1:51
namely because platforms needs names and
1:57
today we’re going to talk about DNS and more specifically multi-cluster than
2:03
ever and how we’ve try to solve that it’s amazing it’s amazing how you made
2:09
the company and actually are your name are you name the product and the internal platform that allows you to put
2:16
in great solution I think on-prem is always something that interesting people to hear because most of the knowledge
2:22
out there isn’t mostly on just Cloud public clouds and going into yeah
2:29
that’s very interesting and so before we are deep dying maybe let maybe explain a
2:35
little bit about the challenges that you had for sure um yes so uh I joined about two years
2:43
ago and uh yeah let’s on this devops journey whatever that really is and I
2:49
tried to look at what are the pain points of the company currently um one thing I saw is that the way we
2:57
managed and operated I.T infrastructure and software we developed
3:03
our social media also had to operate in regard to running the business in general from
3:08
third-party windows I saw that it was way it was very scattered around
3:14
virtual machines bare metal servers a bit of containers but the containers
3:19
running in a on something we call I don’t even know if that’s a term out there but we call
3:24
it Docker hosts so yeah Dr container containers with the dagger uh engine and
3:30
but running on single hosts for some environment so no though really robust uptime and so on so
3:37
I thought uh what can I do to streamline uh the way we run things uh so I set out to
3:45
create a platform based on kubernetes as we know it’s a
3:51
distributed system for orchestrating containers and because we for one thing
3:56
all News software we developed namely we we do it for containers anyway that was
4:03
already a theme before game so I thought hey we’re always doing this for new
4:09
things and pretty for some years and also the the older stuff was also being developed to run in
4:16
in a container so far we need to orchestrate this and also create a
4:21
platform uh on that on in on the way that can honor and run like the general
4:27
things components software and so on we need to run and there you go with people
4:32
namely and one of the challenges was then to actually get DNS riggered in an
4:37
automated way and when I started it it was a user supposed to get the email and then
4:46
he waited and then then I say up to four days but I actually experienced that it
4:51
took up to four days because there’s some first level handing in the world you
4:57
know that the old ticket gas ticket Bingo or whatever we call it these days
5:03
uh so it could take many days totally unproductive days of you know okay then
5:10
you figure out something else and you but he’s we’re still waiting for this DNS record which is a really simple
5:16
thing just to to connect to your service running somewhere right and besides that you also pinpoint here
5:22
it’s it was very manual and Automation and all
5:28
um and then in a container orchestrated world
5:33
and with kubernetes you really have to um
5:40
be in control and being able to know what’s going on when
5:45
it comes to DNS because of kubernetes has always been this right you have CNN so many layers you have the cube DNS
5:52
service you have uh nodes local DNS you have outside the
6:00
name and your course if you have stuff coming in for the public incident you also have you know public uh the public
6:06
DNS infrastructure so there’s a lot of layers of DNA so I wanted um a robust and performance and
6:13
manageable way of yeah operating a dnf service for the different needs we haven’t had at the
6:20
time um so yeah I think that’s I I think it’s really amazing because
6:27
DNS is usually the things that back in the days someone will open a ticket and
6:34
someone else will like edit manually the the configuration file to add the
6:39
records that you wanted and when you come to kubernetes or Docker when everything is is more fast everything
6:45
spawn go that and you need this service Discovery ability quick and fast it’s
6:51
becoming necessary to make it shorter and faster on the other side we know that DNS
6:58
problems are actually I think the second most common problem to causing
7:04
infrastructure down downtime because when you don’t have DNS you don’t have anything so it’s also this critical component
7:12
that when you don’t have it right like your all infrastructure is going down yeah and maybe if we have time I could
7:20
come into a really we had the perfect storm of uh of of DNS failure that that
7:27
really cascaded into like certificate certificates not renewing
7:34
so uh yeah that was nice incidental or a nice uh in hindsight
7:41
yeah yeah if you learn from them they’re really good I agree I agree so let’s talk maybe a
7:48
little bit about the goals what you’re trying to achieve after you discover all these challenges and problems
7:56
yes um yeah so we set out the with them The Horizon that we just talked about
8:02
the the scene which is the elaborated on we’d set out to achieve these goals of all too many DNS making it a
8:10
out of the box platform service on kubernembly um and pretty much something that that is
8:17
there and we of course we have to update the components of the DNS packets we have but we also
8:25
don’t have to you know push push it along every day and nurture it so that it it should be somewhat
8:32
a pretty robust and live by itself uh in in most cases
8:37
and also we didn’t later on found that oh okay we started out with some fully
8:42
qualified domain names of X Dot
8:49
plus the part of the name that whatever soft domain we’re in control of and and we wanted to abstract away uh to the
8:56
developers what cluster your service is actually running on because one they don’t care two it’s extra
9:04
characters you type in in your browser uh Fields search panel
9:11
uh it’s also it doesn’t it it doesn’t
9:16
give meaning to them what is this this whatever across the water but it was like I’m okay I know I’m it’s in my
9:24
service in test but I don’t I don’t care watch server specifically I just want to hit this thing and work with the right
9:31
so that was also a part of uh of our journey later on
9:37
and some of these decisions uh have to be very pragmatic because starting out
9:43
it was it was me and two uh one one uh one student at a
9:50
university if so only 15 hours a week and another one that was pretty green for you on a green field but green and
9:57
new in in the world’s Cloud native and kubernetes so I had to make some decisions that
10:03
okay this really have to be able to live on its own and be
10:08
their hands off and as I write here I think that’s interesting because
10:15
sometimes when people think about climb Cloud native maybe an organization that
10:20
started on cloud native it’s it’s it’s pretty nice but when you take a look a
10:26
look about organization that haven’t started them and then you need to create everything on your own is becoming like
10:32
you’re doing everything on your own and and changing a lot of things within
10:38
the organization that are not related to technology and and that’s and that’s amazing and I
10:45
think the challenges that you break through in terms of bringing a new platform to the organization and also
10:51
solving each part of it on your own and with low resources that’s amazing it’s
10:58
really amazing I agree and and one thing with we haven’t mentioned here is but I
11:03
just think of now that we talk about it is low resources also in the same sense of
11:09
Finance like like you know budget uh
11:14
there wasn’t really any you know budget for buying all types of software so I
11:21
quickly saw and and concluded that what we do needs to be open source and yes we
11:29
want to give back to certain things and and I’ve been active in different uh projects to give the feedback or
11:37
maybe a pull request or whatever but it needs to be something we could start out with and then maybe you know buy the
11:44
Enterprise option or whatever and I think DNS is like a good example
11:49
because core DNS is one of the top uh commonly adopted open source DNS
11:58
solution I think like most or almost all of the deployment of kubernetes I know
12:03
with a lot of our customers rely on core DNS and it’s amazing that you pick this
12:09
open source solution which is a core component infrastructure which 10 years ago you don’t I’m not sure if people
12:16
would intend to put this infrastructure on open source solution you are able to
12:22
bring in uh by the way if someone got any
12:27
questions feel free to drop it on the chat we will be happy to have to to
12:34
answer the question and help you understand much better uh about mem League goals challenges and what we are
12:41
going to discuss later in this talk
12:47
so maybe let’s jump in for the people that are not DNS expert DNS is like it’s
12:52
complex and as you said you have multiple configuration multiple layers within the node that side of the node in
12:59
the cluster outside of the cluster so maybe we will make sure that everyone
13:05
gets the same basic of understanding about DNS and then we will be able to show what solution did you put in place
13:13
in order to get this multi-cluster DNS and it’s not only about that it’s more
13:21
cluster I think what you mentioned in here the the fact that it’s fast the fact that it’s Dynamic and the fact that
13:28
developers just don’t care whether it’s run on that’s give the power to people
13:35
that don’t need to understand and learn and put in a lot of like tribal knowledge
13:42
so let’s jump in about kubernetes and DNS at its basic
13:49
so why we should even need a DNS within the cluster
13:54
so I would start with the fact that any kubernetes cluster got some sort of DNS
14:02
um why is that first of all you need the internal Dynamic resolution pods are going up or down services are
14:08
distributed a podder distributed and you need this fast resolution because it’s
14:14
so Dynamic you need something that will help your services to resolve it even
14:20
just internally to understand where the relevant pods water relevant IP and it also relates to
14:27
service discovery which helps you to understand I don’t care about how many pods are behind this service I just want
14:33
to eat a service make sure I get it in place but all of that is just inside the
14:39
cluster when we are talking about outside of the cluster it’s a different level and Lars are going to explain what
14:45
are the challenges there later the next thing is Cash you want to Cache
14:51
your request because it will make you anything the system will flow faster and upframe forwarding if you don’t have the
14:58
the right service you definitely want to forward it to some other DNS that should
15:05
have the answer for it so this is why and and there are many reasons for it
15:10
but those are the four main reasons why you should have DNS within the cluster uh I would say that in some
15:18
architectural last reference before uh you you have like a DNS local agent in
15:25
any one of your nodes hey we see that when you’re going into a large scale architecture uh that you have a lot of
15:32
traffic with in the cluster of DNS and then you want to Cache it on the Node level which is interesting
15:41
jumping a little bit deeper let’s talk about the flow like what actually happens when one of our services one of
15:49
our pods need another service in the cluster so we got the billing service and the credit card service both of them
15:55
are running in the same cluster but the pods on the credit card service they are at party you don’t know where they are
16:01
you just want to eat the service endpoint and get the results so what happens is that in the cluster
16:08
we have cluster core DNS which always communicate with the adcd to understand
16:14
the IP of the Pod where they are running ads so when a request flows through the
16:19
service then the cluster core DNS can standard the billing service the right
16:24
AP and everything will flow into the internal networking of the cluster it
16:31
doesn’t matter if a overlay Network or the cloud provider Network everything Flows at the network of the cluster
16:37
itself and this is internal very basic uh request the service as for request core
16:46
DNS reply and then you eat the right service support
16:51
but what happens when our pod need to go outside of the cluster so our billing service not need only the
16:59
other service that we got it may also need some banking service on another cluster another external service maybe
17:07
it’s a different company that you pay for them in order to use this API
17:12
so then you need to your coordinates needs to forward the request or tells
17:18
you where the DNS servers to actually help you to find out the answer and actually the
17:25
cluster kubernetes will help your ports to have this configuration in place
17:31
um and when the billing service actually go it will flow into the organization DNS
17:37
like the Google DNS whatever names that you pick name server that you
17:43
pick for that and that’s basically external cluster resolution flow when our pods actually
17:49
aims to go for external service so in the cluster we have the internal
17:55
request and the external request when our service actually requests some information
18:01
from other service but what happens maybe if we are taking
18:06
a look at the other way around we we need some other way to
18:13
[Music] actually for the billing service of someone else
18:18
to able to get an answer from our cluster from our
18:23
service from our pods and sometimes you even don’t know where the pods and the
18:29
service running on and that’s a big problem and large
18:37
this is what you want to achieve and maybe you can elaborate a little bit about how you did achieve that
18:46
[Laughter] maybe you could show the slide of the
18:51
stack ing how did let me solve it almost specifically I solve it yeah
19:00
just to give a overview and a components let’s stack level
19:05
we we have a puberty in the cluster it’s a kubernetes cluster uh we’re running the k3s distribution
19:13
we have active directory we are outside the kubernetes platform
19:20
to a large extent they might just have Microsoft house
19:25
but um so we have an expiratory and we have coordinates uh running in of 450
19:32
net instance or as an approach within this instance so we’re exposing it outside the glossner Via a network
19:38
service object student subject and then we have a very important
19:43
um service workload here Cadence Gateway by some a company called Ori a British AKA
19:53
UK company and then we have of course the internal kubernetes
19:59
service it’s also called DNS instance so exactly accordion is all over the blade
20:05
and then we are considering and I’m reading up on it and so on to see if we
20:11
actually need the node local DNS cache Service as well
20:16
to yeah making is resolving more performance so it doesn’t need to go to
20:21
another node to reach the internal kubernetes part to get something resolved but um
20:28
we might actually not need it because we’re running the qpns service as a demon set on highly available clusters
20:34
so we actually have a recording as part and all nodes so it’s like not needed so
20:40
yeah that was the the stack if you go to the next one then
20:47
um with this back let me go through the flow of the um of the DNS packet on the
20:56
UDP protocol um I think this is the best way and easiest
21:02
way to do it so let’s say we have some service called uh for my service on the
21:08
production tier of uh qnemly and Dot TLD stands for top level domain we have a
21:15
specific domain assigned to the kubernetes platform that our approach
21:21
within this instance on the production all the environments we have we have in the Forge business instance
21:27
um servicing that here for uh for DNS we’re solving from the outside
21:33
so on to active directory um setup we have in namely we have
21:39
to a Zone per um environment finding sort of 40 Fitness service on the management
21:46
cluster for that environment and the management class they have some specific workloads that honor different services
21:53
that the entire fear slash environment needs from kubernetes to service the downstreams
22:01
that’s worker classes so a package for buying service that
22:08
product TLD goes to on let’s say guys computer goes to uh
22:14
to the active directory um and that server because the active dnf the actors with DNS were showers are
22:21
registered or configured on your laptop to be the ones that you want to uh to get in DNS resolve into program
22:28
and the active directory setup says yeah that’s cool I know where this domain is
22:33
but I’m not approachable word because that’s delegated to over here and over here is the production management
22:39
cluster and now we are hitting the approach within this instance of Dimension
22:45
cluster and this is a coordinate instance um that is configured to be authority
22:52
over um all tld.prod dot whatever shop domain
23:01
right so um it says thank you I’m receiving a DNS
23:07
requests to resolve this if you didn’t but it goes through its uh you know the
23:14
accordion is configuration and it’s you know if you read up in the
23:19
code in this documentation you can figure out how it’s it’s passing this configuration so it finds the the most
23:27
significant uh hit in the configuration so the matching the most matching uh uh Stone
23:35
in its server zone for for the for my service.tld
23:40
and now we hit that one and in there we have configured a external coordinates
23:46
plugin in the quality Network we have internal plugins and external ones
23:52
and this external plugin is so it’s made by some company uh open source plugin
23:58
given back to the coordinates project it’s called pan out and what it does is
24:06
very basic but very nice so let’s say I have a DNS server I’m a
24:12
DNS server instance running on Downstream clusters why and guy is a downstream cluster uh it’s a downstream
24:19
DNS server running on cluster X so someone trying to reach for my
24:25
service to be one was then hitting actual directory the product management
24:30
cluster Authority internet server and that DNS request now hits all 1000 plus
24:37
on the test or the sorry in this case the production tier so both the X and Y
24:44
Downstream cluster in this case receives this request
24:51
it’s configured to uh with the external IP of the Cadence Gateway code DNS
24:58
instance by this company called Ori o r i
25:06
um and what this does is it’s really cool really cool service it it takes the
25:11
request and then it looks up uh request or looks up the kubernetes API
25:18
is very it’s very uh service type of the type load balancer or increase
25:25
with this uh name so we can resolve the name by different you can you can put an
25:31
accordion as hostname annotation on your service load balancer or increase or it
25:36
can be the actual host name if it’s an Ingress or it can be a combination of you know
25:43
service name but namespace so on uh and if
25:48
if if if the if if there’s no service of the type Loop answer increase on the
25:54
downstream cluster let’s say in this case it’s uh the cluster X that doesn’t have the for my service workflow
26:03
it will be a edx domain which is a rhc DNS uh response problem response and but
26:11
the other Downstream plus the why hey hey this Gateway coding net service found a service
26:18
with the ipv4 of this for my service and we charge that response to the
26:25
Authority image server on the management cluster for the production tier and that
26:30
is then returned to the end user so that’s the Journey of a DNS packet uh
26:38
trying to be resolved to a itv4 on the
26:43
production tier of kubernembly that is a long journey uh explaining it
26:48
but it it happens really quick that was one of the things I was
26:54
concerned about with the fan out plugin because you’re stealing this inquiry to to potentially a lot of Downstream
27:00
Clauses we have four maybe on one of the tiers so it’s
27:06
not that heavy but it’s pretty fast because you have dnfs UDP
27:12
and it’s a very small package and then of course
27:17
these queries there’s cash involved when it comes to DNS so it’s not counting the
27:24
Clusters too hard so we’re okay but the fan of talking is actually the
27:31
plugin that made it possible to tally abstract the way what cluster for
27:37
my service is running on because now you’re just yeah for my service but product CLD and
27:44
it every Downstream cluster will be queried and the one actually having the service will now respond
27:50
but like for the end use because yeah you don’t have the cluster specific part of the name
27:56
and you could just reach your service bill it’s automated and it’s nice for us it’s platform operators because we can
28:04
we are more free to move the service to another cluster for example if Downstream cluster why is having a bad
28:09
day we can move it to Cluster X if validated this in this case if we really
28:14
think that it’s important the GTL is 300 on our records and so or wait for the
28:21
cash to be invalidated itself and then now the the response will go to Trust Banks so that’s also nice
28:28
feature for us to have that’s that’s really impressive like the
28:33
ability to use this plugin in order to get response for where where the actual
28:39
service is really is is a key feature in this solution but how it looks from the
28:45
developer perspective like when they trigger the CI CD they don’t care where it runs
28:51
no um so we’re using uh we like to go by
28:56
the githubs paradigm so there’s an um part OCD that’s the product we ended
29:03
up with on um all clusters and yeah it’s the usual you know you
29:09
configured to to hit your integrate with your versioning management system
29:14
and so on and and they get a cluster assigned to do that
29:19
and then then they hit the and then Argo CD will just you know deploy the workload a part of that workload is
29:26
either low balancer type service mostly it increased because most of the services it’s restful apis
29:34
um and that interest will just you know have a sqdn resulting in an English
29:41
object being created and that’s it because now user whoever
29:48
and remember that students type it in your browser whatever and it will follow this
29:54
packet this journey of the package and now it’s on will reach Downstream cluster X if
30:02
the service occurring elephants with the QQ if creating an elephant but proper
30:08
TLD you get it after after it’s deployed and the increases object is created
30:15
it’s working wow wow in just a matter of seconds from the deploy time to the time that
30:21
you get your rally they can click on it basically exactly yeah amazing we have
30:26
one question from the audience so the question is what was the main
30:33
reason for having two clusters okay we have more than two clusters
30:39
actually we have we have as many as we want it was basically uh
30:47
a capability of one that I didn’t want to be limited by amount of Clauses
30:53
because I like um
31:00
the capability of you know shifting things away from from
31:05
one cluster to another there might be some you know security reasons there might be
31:14
the phone was reasoned other isolation reasons this this this
31:20
fundamental capability gives me now the option to say oh maybe I should have a
31:27
staple set cluster only because we all know staple said is that yeah it’s a little more challenging
31:33
right so if I’m if I commit if I dedicate my this cluster to staple
31:39
workloads at least this is all this is the only class that where I know that are it’s extra cumbersome because you’re
31:46
wanting you’re running this design they have this specific works and all that so
31:51
that’s some of the reasons um and also if you want to scale and we want to
31:58
scale you know all to scale with nodes as well as Parts HPA vpa but it’s really
32:04
if you if you need to scale with like huge like say 100 gigabyte
32:10
30 CPUs type notes it it’s going to take a longer time it’s going to be heavier
32:16
cluster and so we try to have smaller nodes also because if you have huge nodes you
32:22
have more parts it will take longer drain longer time to scale so if you can have smaller nodes yes you could have
32:29
more uh nodes in one cluster but still you can you have more wiggle
32:34
room moving these workloads around when you have uh yeah
32:40
this this feature of running as many Clauses as you need
32:46
I I think that that when you’re talking about uh highly availability and it
32:52
would collect the flexibility of our environment this is where multi-cluster or things like that are taking in place
32:59
but I would say that when you’re building a platform we see that it was
33:05
not very common before but now even on development and staging uh teams used to
33:11
have multiple clusters because you don’t want to get this message or in slack on
33:17
teams like please do not merge the staging cluster is a little bit broken I will fix it and then you will merge
33:24
afternoon you want this velocity and the flexibility of moving one service or
33:30
testing it on another cluster without affecting on anything on your testing
33:35
system or the CISD pipeline give you this availability and and that’s super
33:41
interesting and I think we see that trend of even multi-cluster on staging in lower environment as well
33:48
yeah and then you could go totally crazy with with a project like v-cluster where you have clusters within the cluster
33:54
yeah yeah I I would say that we um a few months ago we did a webinar only
34:02
about virtual cluster and the ability to spin up ephemeral environment and use
34:08
Commodore for that in order to give access to people in that if you want it’s it’s on the YouTube channel
34:15
and just before we are downloads anything else
34:21
uh um yeah I mean I can I can talk about this
34:29
very beautiful incident if if there is isn’t any more questions
34:34
I could elaborate on that incident and what we learned from it we don’t
34:40
have any at the moment if someone have questions but drop it as you did
34:46
drop it onto one um so
34:52
a little while ago we are fortunate I was so wise to choose to Let’s Have Some
34:59
Testing on that thing you know extra cash so we had a another external plugin
35:05
called redisk r-e-d-i-s-c which basically gives you the
35:11
opportunity with the with coordinates is an external coordinates plugin to half
35:18
store DNS cache in a radius cluster and
35:24
when when it worked it was pretty pretty cool because queries would just be of course entered
35:30
from the the cache of the approach to the news server so it wouldn’t even go to the downstream clusters
35:37
then one day something happened that something was the 01c Killer
35:45
and uh that killed at some point the CSI
35:50
manager the manager of the CSI we use and that killed the the storage for the
35:57
written now you have no cash but I had with the documentation and said yeah I thought even in this case it
36:05
would be a no-off as you say no operation it would just be go on and query the downstream clusters like
36:11
really query them Thursday okay that’s that’s okay that’s a pretty robust system but so it turns out that it’s only a no
36:19
op if red is is not available when coding it is started up
36:25
so if it started off with the radius cache it would try to hit the radius cache and then find out after a nice
36:32
turn on timeout up many hundred seconds that ah radius is not available
36:56
so that made the the life of things really miserable because it ended up
37:02
because this is the this incident took what’s happening over some hours that we
37:07
have very short-lived details on our certificates we can come back today in another day if that’s the subject we
37:13
want to talk about but this is very shortly CTL in after 24 hours means that
37:18
if something is um yeah going on for too long and it can’t resolve that
37:23
the ca that we’re running and then your certificate is not renewed resulting in
37:30
some specific domains not being certificate not being renewed for these domains and people not being able to put
37:36
items in their basket on email.com so that was pretty great so um
37:42
what we learned from that was to not run this plugin because
37:47
it’s pretty shitty to to have the cash go down in these
37:55
prolonged timeouts um but then when I dig into the code of the plugin I found that oh the way it’s
38:01
actually it’s using some go libraries to actually integrate with radius and and and that
38:07
was deprecated like one and a half years ago or well yeah so um
38:13
and then I thought I actually reach out to this to to this black community of the coordinates uh
38:20
Korea’s Channel I think it’s on cncs community on slack and ask what do other
38:26
people do here and reach out to to some of the maintainers and it was pretty
38:31
much like yeah you don’t really need this cash thing I mean so now we we we’re okay with the we we
38:38
have several instances several replicas of the authorities in it and we just use the internal cache
38:45
plugin so the the pot the one replica one of the all the replicas will have their own internal
38:50
cache and it works it works beautifully uh of course the thing is that maybe you have
38:58
the cash in this replica but the but because of load balancing this replica was hit
39:04
and that part did not have the Dennis record in cash so now it goes to answer but
39:09
we’re not seeing any performance issues with that so far at least
39:15
so um yeah I guess it’s a pasture things really well in in all
39:21
cases it’s also a don’t over engineer when it’s not needed
39:27
um I think that’s some of the learnings we we have with with that and we went way away from it and
39:35
yeah I think I’m all happy now I think it’s a it’s an amazing lesson
39:40
learned with the fact that first of all when you’re using some open source
39:46
plugins the downside they can get deprecated or not useful and you need to keep in Pace when you’re using Cloud
39:51
native software but on the other side it’s really nice that when you ask something the community it’s actually
39:58
there to help you unfortunately it was after the incident and you learned a lot from it but it’s
40:04
good that you have someone to go ask and get a real response in action yeah sure
40:10
so before we are done we have one last question from the audience
40:15
[Applause] um they ask about like what is a
40:20
performance impact of this kind of multi-class architecture so what I’m
40:26
saying is is multiple cluster of DNS the final of the plugins it’s it’s not
40:31
always the obvious one maybe you can explain a little bit if there is and what is it the performance if the impact
40:38
of this kind of architecture yeah so so um
40:44
very clearingly we are using more resources because we have several
40:50
control planes um and we have two options we have the
40:55
highly available cluster and we have the I call it the app cluster because it’s it’s um
41:01
would only be running low low risk uh and
41:07
not highly uh workloads needing high high
41:13
performance uh and those app classes would only have one control plane node
41:18
it’s it’s a I don’t think we actually have that with other kubernetes distributions than k3s not that I know
41:26
of at least it’s an option where you can run a single control node uh API
41:31
activities control pane yeah so we have these two options we only have one cluster running the app cluster version
41:38
of things um to save resources and because it’s only
41:43
running a specific performance workload who work three times a day but for the
41:50
other sources the more regular ones where we want the uptime the robustness the the the full stay in your domain
41:56
Promises of the distributed system that kubernetes is yes we are using some more resources
42:02
because we have you know at least you need three notes uh on the control plane
42:08
side so we have these extra nodes but besides that because of the advantages of the
42:14
capabilities it gives us spreading out load um um spreading out the stable domain of
42:20
the black radius of something going totally bunkers on some cluster it won’t touch the workers and other clusters and
42:28
these isolations uh I think this isolation capability gives us is pretty nice
42:34
um and also because it’s kubernetes yes I mean it’s because it’s k3s the you
42:43
can have etcb embedded on the control play notes um
42:48
and case queries in general is really lightweight um but I am looking into both other
42:55
distributions but also ways of having
43:02
um multiple clusters more integrated in a high in in a more
43:09
tightly way so you you could have one master control plane and some more
43:15
knee-jerk uh Downstream clusters with less of it thinks it has a control plane
43:21
but it’s more or less controlled by this Puppet Master up here and you have different projects out
43:27
there and I’m looking into different ones also you could do something like the clustermist from uh from Ice
43:32
surveillance the creators of psyllium so I think I’ve answered the questions
43:38
to some extent at least yeah yeah that’s amazing uh so before we are
43:46
done Lars I really want to thank you for joining us and sharing your knowledge
43:52
the challenges that you add and the solution that you put in place we’re really trying to bring in people that
43:58
will share with their own experience knowledge that is not commonly shareable in the internet and not really uh is to
44:05
find so thank you very much for your time thank you everyone for joining us thank you for the opportunity
44:12
I hope you learned something I learned a lot and I I know that everyone that was live and we are going
44:18
to stream it live or upload it to YouTube they will be able to watch it again and learn more from your DNS
44:26
Journey thank you bye everyone
and start using Komodor in seconds!