CK: Collective Knowledge
This blog post gives a brief introduction into CK and its basic concepts. There is a ton of existing documentation out there in the CK wiki on GitHub. All of this documentation can easily feel overwhelming. This is why I wrote this deliberately short and lightweight introduction into some of the fundamental basic concepts of CK, which helped me a lot in understanding CK.
I assume that you have the CK tool installed on your machine, which you can easily check by running ck version. If this returns an error you want to install CK by running pip install ck 1.
So what is CK?
To put it quite generic, CK is a tool which helps organise and work with stuff you care about. Stuff can be a lot of different things, such as research data, programs or scripts analysing this data, as well as the resulting data obtained by the analysis – just to give a typical research workflow as an example.
CK helps you to organise this stuff by assigning unique identifiers (so called ‘UIDs’) to every entry registered with ck. Entries are stored in repositories which facilitate sharing. A special type of entries are modules which implement the functionality of CK. CK comes with a set of built-in modules, but you can also write custom modules yourself.
Entries, repositories, and modules are the basic vocabulary of CK. Let’s start talking more about them.
CK Entries
CK tracks entities by assigning them unique identifiers. Each entry is stored in a separate directory and CK also stores additional metadata in form of a couple of JSON files for each entry. These file are stored in the .cm subdirectory of the entry. There are three metadata files:
.cm/info.jsonstores information like, who is the author or what is the license of the entry, etc..cm/meta.jsonstores arbitrary meta information about the entry, which is used by the CK modules to process this entry. One important example are tags which are identifying words which can be used to filter out common entries..cm/desc.jsonis indented for a documentary description of the entry, but currently mostly empty.
CK Repositories
In CK a repository is a collection of entries which are meant to be shared with other people. CK uses a tool called git which makes it incredible easy to share repositories among team members or make them publicly available. Websites such as GitHub or Bitbucket can be used to host CK repositories online.
Ck stores all of the repositories in one central folder. On linux and macOS this is by default: $HOME/CK_REPOS.
CK Modules
Modules in CK group entries as well as actions to operate on these entries. CK entries which are operated on by a particular module are put in a directory which has the same name as the module. For example:
- Programs, which are compiled and run by the
programmodule, are put in a directory calledprogram. - Datasets, which are extended by the
datasetmodule, are put in a directory calleddataset. - Experiments, which are added, browsed, and rerun by the
experimentmodule, are put in a directory calledexperiment.
This leads to a familiar directory structure where at the top-level directories are called after CK modules, e.g., program, dataset, and experiment. At the second-level directories store the actual programs, datasets, and experiments you care about, e.g., program/my-awesome-program, dataset/my-awesome-dataset, and experiment/my-awesome-experiment. These are themselves CK entries with their own metadata and UIDs.
Actions in CK are functionalities offered by modules to operate on CK entries. Let’s make a few concrete examples:
- The
programmodule offers actions forcompileing andruning programs. - The
datasetmodule offers an action for adding new files into an existing dataset (add_file_to). - The
experimentmodule offers actions foradding new experiments,browseexisting once, orrerunexperiments.
Every command line in CK has the same basic form to perform an action of a particular module:
ck action module
Therefore, we write: ck compile program, ck add_file_to dataset, ck rerun experiment, and so on.
This style is deliberately designed so that the commands read like sentences. I call this ck action module structure the grammar of CK.
CK commands which talk about particular entries specify them by using the following notation:
ck action module:entry
Sometimes it is required to help CK distinguish between entries in different repositories. In these cases we have to write:
ck action repository:module:entry
Many modules allow to specify additional options as command line flags. You can get a full list of supported actions by calling on a particular module:
ck help module
CK modules for managing repositories and modules
There exists CK modules for managing repositories and modules themselves. These are called repo and module and are briefly described here.
repo
Repositories are a central concept in CK (as we have seen above) which are managed by the repo module.
Here are some things one can do with this module:
ck info repolists information about therepomodule itselfck help repolists all possible actions one can perform with a CK repositoryck list repolists all installed repositories
There are a number of things one can do with a particular repository. We take the ck-autotuning repository as an example:
ck pull repo:ck-autotuninginstalls or updates theck-autotuningrepository to the latest version on the remote server (It is performing agit pullon the GitHub repository: https://github.com/ctuning/ck-autotuning)ck info repo:ck-autotuninglists information about theck-autotuningrepositoryck find repo:ck-autotuninglists the path where theck-autotuningrepository is installed
module
Modules are managed by a module called module.
Similarly to the actions on repositories one can:
ck info modulelists information about themodulemodule itselfck help modulelists all possible actions one can perform with a CK moduleck list modulelist all installed modules, across all installed repositories
To list only the modules of a particular repository, for example ck-autotuning one can execute:
ck list module --repo_uoa=ck-autotuning
The --repo_uoa=ck-autotuning part is an input argument passed to the list action of the module module. To list all the possible input arguments of an action call:
ck action module --help.
So for example: ck list module --help. This will print a description of the action and which input arguments it will process and what output it will return.
Common CK actions
There are some actions which can be used on every module. These are called common actions. You can list all common actions by running: ck help.
Furthermore, you can always call ck action module --help to get learn about the input arguments and return values of an action.
Many of the common actions are for managing ck entries, the most important of them are:
ck add module:entryadds a new ck entry calledentryto the module namedmodule.ck cp module1:entry1 module2:entry2copies ck entry calledentry1frommodule1intoentry2inmodule2.ck find ***module***:***entry***prints the path of the ck entry namedentryfrom modulemodule.ck mv module1:entry1 module2:entry2moves ck entry calledentry1frommodule1toentry2inmodule2.ck rm module:entryremoves (deletes) an existing ck entry calledentryfrom the module namedmodule.
Where to go from here?
I only scratched the surface of CK. I haven’t talked about the meta data format (which is JSON) and the implementation of your own custom modules (which is commonly done in Python).
As I said in the beginning, there is plenty of documentation available on the CK wiki. It is incredible useful to keep the vocabulary (entries, repositories, modules) and the grammar (ck action module) of CK in mind while reading these documents and start playing around with CK.
The two most appropriate starting points are the Getting Started Guide and the Portable Workflows page.
For seeing how to implement you own workflow with CK following an example, read the Getting Started Guide.
For learning how to implement portable workflows with CK, by
- Describing and detecting existing software
- Setting up software environment
- Automating installation of a missing software
- and more …
read the corresponding sections in the Portable Workflows page.
Also, ask questions on the CK mailing list. The community is very much open to answer your questions!
-
If you have troubles installing CK this way you find more information in the CK wiki. ↩