Expiration
Categories:
Context
During the CTF, we don’t want players to be capable of manipulating the infrastructure at their will: starting instances are costful, require computational capabilities, etc. It is mandatory to control this while providing the players the power to manipulate their instances at their own will.
For this reason, one goal of the chall-manager is to provide ephemeral (or not) scenarios. Ephemeral imply lifetimes, expirations and deletions.
To implement this, for each Challenge
the ChallMaker and Ops can set a timeout
in seconds after which the Instance
will be deleted once up & running, or an until
date after which the instance will be deleted whatever the timeout. When an Instance
is deployed, its start date is saved, and every update is stored for traceability. A participant (or a dependent service) can then renew an instance on demand for additional time, as long as it is under the until
date of the challenge. This is based on a hypothesis that a challenge should be solved after \(n\) minutes.
Note
The timeout should be evaluated based on expert’s point of view regarding the complexity of the conceived challenge, with a consideration of the participant skill sets (an expert can be expected to solve an introduction challenge in seconds, while a beginer can take several minutes).
There is no “rule of the thumb”, but we recommend double-testing the challenge by both a domain-expert for technical difficulty and another ChallMaker unrelated to this domain.
Deleting instances when outdated then becomes a new goal of the system, thus we cannot extend the chall-manager as it would be a rupture of the Separation of Concerns Principle: it is the goal of another service, chall-manager-janitor
. This is also justified by the frequency model applied to the janitor, which is unrelated to the chall-manager
service itself.
With such approach, other players could use the resources. Nevertheless, it requires a mecanism to wipe out infrastructure resources after a given time.
Some tools exist to do so.
Tool | Environment |
---|---|
hjacobs/kube-janitor | Kubernetes |
kubernetes-sig/boskos | Kubernetes |
rancher/aws-janitor | AWS |
Despite tools exist, they are context-specifics thus are limited: each one has its own mecanism and only 1 environment is considered. As of genericity, we want a generic approach able to handle all ecosystems without the need for specific implementations. For instance, if a ChallMaker decides to cover a unique, private and offline ecosystem, how could (s)he do ?
That is why the janitor must have the same level of genericity as chall-manager itself. Despite it is not optimal for specifics providers, we except this genericity to be a better tradeoff than covering a limited set of technologies. This modular approach enable covering new providers (vendor-specifics, public or private) without involving CTFer.io in the loop.
How it works
By using the chall-manager API, the janitor
looks up at expiration dates.
Once an instance is expired, it simply deletes it.
Using a cron, the janitor could then monitor the instances frequently.
flowchart LR subgraph Chall-Manager CM[Chall-Manager] Etcd CM --> Etcd end CMJ[Chall-Manager-Janitor] CMJ -->|gRPC| CM
If two janitors triggers in parallel, the API will maintain consistency. Errors code are to expect, but no data inconsistency.
As it does not plugs into a specific provider mecanism nor requirement, it guarantees platform agnosticity. Whatever the scenario, the chall-manager-janitor
will be able to handle it.
Follows the algorithm used to determine the instance until
date based on a challenge configuration for both until
and timeout
.
Renewing an instance re-execute this to ensure consistency with the challenge configuration.
Based on the instance until
date, the janitor will determine whether to delete it or not (\(instance.until > now() \Rightarrow delete(instance)\)).
flowchart LR Start[Compute until] Start-->until{"until == nil ?"} until---|true|timeout1{"timeout == nil ?"} timeout1---|true|out1["nil"] timeout1---|false|out2["now()+timeout"] until---|false|timeout2{"timeout == nil ?"} timeout2---|true|out3{"until"} timeout2---|false|out4{"min(now()+timeout,until)"}
What’s next ?
Listening to the community, we decided to improve further with a Software Development Kit.
Feedback
Was this page helpful?