To complement our Human Profiler, CHAMP™, Kepler uses a host-agnostic curated database to process samples beyond the human host including but not limited to environmental, animal, soil, food samples and many others.
Kepler extracts optimal value from metagenomic data by combining the precision of K-mer exact-matching and the versatility of probabilistic alignment. Through this method, Kepler achieves robust identification and enumeration of bacteria, viruses, fungi, and protists by leveraging a meticulously curated biomarker database, where over 30,000 species are arranged in a phylogenetic tree-like structure.
The core of Kepler’s technology is patented in both the US Patent office (US10108778B2, US20200294628A1) and European Patent Office (ES2899879T3).
How does Kepler work?
The Kepler multi-kingdom taxonomic profiler is divided into three parts:
1. Leveraging a Curated Database of Microbial Genomes
The Kepler database of high quality microbial genomes is based on high completeness:low contamination ratio, genome assembly quality and prioritizing intra-species diversity whilst limiting phylogenetic redundancy. The genome assemblies are then scrubbed clean of low complexity sequences, prophages, plasmids and host-contaminated regions to maximize the taxonomic signal-to-noise ratio. The final database encompasses multiple microbial kingdoms and >30,000 species.
2. Identifying Relevant Biomarkers
3. Searching the Biomarker Database
The second per-sample, computational phase searches the millions of short sequence reads or contigs in your data against the phylogenetic tree-like database build:
A. The first comparator splits the sequencing reads into k-mer sets that are then queried across the different branches and leaves of the phylogenetic tree to identify the different taxa present in the query kmer-sets. The first comparator splits the sequencing reads into k-mer sets that are then queried across the different branches and leaves of the phylogenetic tree to identify the different taxa present in the query kmer-sets. The first comparator looks for exact matches between query k-mers and reference bio-markers and classification sensitivity and accuracy is maintained through composite k-mer/biomarker aggregation statistics and coverage depth estimation.
Evaluation of Kepler with Standardized Community Controls
To benchmark Kepler, real-world community standards were utilized to compare its efficacy against leading profilers such as Kraken2/Bracken and MetaPhlAn4. For these comparisons, 5 different community standards were employed with both even and staggered (log distribution), from ATCC and Zymo.
Kepler distinguished itself not only by achieving a superior F1-Score (a balanced measure of precision and sensitivity) but also by its exceptional ability to detect low-abundance taxa (Bacteria and Fungi) as well as its precision in differentiating closely related taxa at the sub-species level, for example, Bifidobacterium longum subsp. longum and Bifidobacterium longum subsp. infantis.