Descriptors =========== The ``Descriptors`` class (``pySAR/descriptors.py``) calculates a comprehensive set of physicochemical, biochemical, and structural protein descriptors. These 33 descriptors span composition, autocorrelation, CTD, conjoint triad, sequence order, and pseudo amino acid composition and produce over 10,000 features in total when all are calculated. Descriptors are calculated via `protpy `_, a purpose-built open-source package for protein feature engineering. Input sequences must contain only the 20 canonical amino acids; gaps are stripped automatically on initialisation. .. code-block:: python from pySAR.descriptors import Descriptors desc = Descriptors(config_file="config/thermostability.json") # calculate a single descriptor aa_comp = desc.get_amino_acid_composition() # shape: (N, 20) # calculate all descriptors at once all_desc = desc.get_all_descriptors() # shape: (N, 10572+) ---- Instantiation ------------- ``Descriptors.__init__(config_file, protein_seqs=None, **kwargs)`` .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Parameter - Default - Description * - ``config_file`` - — - Path to the JSON configuration file. The ``.json`` extension is appended automatically if omitted. * - ``protein_seqs`` - ``None`` - Protein sequences as a ``pd.Series`` or a single string. If ``None`` or empty, sequences are loaded from the dataset path specified in the config. * - ``**kwargs`` - — - Keyword arguments (``dataset``, ``descriptors_csv``) that override the corresponding config file values. On construction the class: 1. Parses the config JSON and loads dataset/descriptor parameters. 2. Reads protein sequences from the dataset CSV if not directly supplied. 3. Strips gaps and validates all sequences against the 20 canonical amino acids. 4. Attempts to import pre-calculated descriptor values from the ``descriptors_csv`` path, if it exists. Importing pre-calculated descriptors is strongly recommended for large datasets — set ``all_desc: 1`` in the ``[descriptors]`` config section on first run to generate the CSV, then subsequent runs will load from it directly without recalculating. ---- Descriptor Groups ----------------- .. list-table:: :header-rows: 1 :widths: 30 70 * - Group - Descriptors * - Composition - ``amino_acid_composition``, ``dipeptide_composition``, ``tripeptide_composition``, ``gravy``, ``aromaticity``, ``instability_index``, ``isoelectric_point``, ``molecular_weight``, ``charge_distribution``, ``hydrophobic_polar_charged_composition``, ``secondary_structure_propensity``, ``kmer_composition``, ``reduced_alphabet_composition``, ``motif_composition``, ``amino_acid_pair_composition``, ``aliphatic_index``, ``extinction_coefficient``, ``boman_index``, ``aggregation_propensity``, ``hydrophobic_moment``, ``shannon_entropy`` * - Autocorrelation - ``moreaubroto_autocorrelation``, ``moran_autocorrelation``, ``geary_autocorrelation`` * - CTD - ``ctd``, ``ctd_composition``, ``ctd_transition``, ``ctd_distribution`` * - Conjoint Triad - ``conjoint_triad`` * - Sequence Order - ``sequence_order_coupling_number``, ``quasi_sequence_order`` * - Pseudo Composition - ``pseudo_amino_acid_composition``, ``amphiphilic_pseudo_amino_acid_composition`` ---- Composition Descriptors ----------------------- Composition descriptors capture the amino acid content and physicochemical properties of a sequence without considering positional information. Amino Acid Composition ~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_amino_acid_composition()`` | **Features:** 20 The fraction of each of the 20 canonical amino acid types within a sequence: .. math:: \text{AAComp}(t) = \frac{AA(t)}{N} where $AA(t)$ is the count of amino acid type $t$ and $N$ is the total sequence length. .. code-block:: python aa_comp = desc.get_amino_acid_composition() # shape: (N, 20) Dipeptide Composition ~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_dipeptide_composition()`` | **Features:** 400 (20²) The fraction of each of the 400 possible dipeptide types: .. math:: \text{DPComp}(s,t) = \frac{AA(s,t)}{N - 1} where $AA(s,t)$ is the count of dipeptide type $(s, t)$ and $N-1$ is the total number of dipeptides in the sequence. .. code-block:: python dp_comp = desc.get_dipeptide_composition() # shape: (N, 400) Tripeptide Composition ~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_tripeptide_composition()`` | **Features:** 8000 (20³) The fraction of each of the 8,000 possible tripeptide types. Computationally expensive on large datasets; pre-calculation and CSV caching is recommended. .. code-block:: python tp_comp = desc.get_tripeptide_composition() # shape: (N, 8000) GRAVY ~~~~~ **Method:** ``get_gravy()`` | **Features:** 1 The Grand Average of Hydropathy (GRAVY) is the mean Kyte-Doolittle hydropathy score across all residues. Positive values indicate overall hydrophobicity; negative values indicate hydrophilicity. .. code-block:: python gravy = desc.get_gravy() # shape: (N, 1) Aromaticity ~~~~~~~~~~~ **Method:** ``get_aromaticity()`` | **Features:** 1 Fraction of aromatic residues (F, W, Y, H) in the sequence. .. code-block:: python arom = desc.get_aromaticity() # shape: (N, 1) Instability Index ~~~~~~~~~~~~~~~~~ **Method:** ``get_instability_index()`` | **Features:** 1 Computed from dipeptide instability weight values (DIWV). A value below 40 indicates a stable protein; 40 or above suggests instability. .. code-block:: python ii = desc.get_instability_index() # shape: (N, 1) Isoelectric Point ~~~~~~~~~~~~~~~~~ **Method:** ``get_isoelectric_point()`` | **Features:** 1 The estimated pH at which the protein carries no net charge, calculated iteratively using standard pK\ :sub:`a` values for ionisable residues. .. code-block:: python pi = desc.get_isoelectric_point() # shape: (N, 1) Molecular Weight ~~~~~~~~~~~~~~~~ **Method:** ``get_molecular_weight()`` | **Features:** 1 Average molecular weight (Da) calculated from residue masses, corrected for water lost at each peptide bond. .. code-block:: python mw = desc.get_molecular_weight() # shape: (N, 1) Charge Distribution ~~~~~~~~~~~~~~~~~~~ **Method:** ``get_charge_distribution()`` | **Features:** 3 Positive, negative, and net charge contributions of ionisable residues at a specified pH using the Henderson-Hasselbalch equation (default pH 7.4). Output columns: ``PositiveCharge``, ``NegativeCharge``, ``NetCharge``. Config parameter: ``charge_distribution.ph`` (default 7.4). .. code-block:: python charge = desc.get_charge_distribution() # shape: (N, 3) Hydrophobic/Polar/Charged Composition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_hydrophobic_polar_charged_composition()`` | **Features:** 3 Percentage of residues belonging to each of three physicochemical groups: - **Hydrophobic:** A, C, F, I, L, M, V, W, Y - **Polar:** G, N, Q, S, T - **Charged:** D, E, H, K, R Output columns: ``Hydrophobic``, ``Polar``, ``Charged``. .. code-block:: python hpc = desc.get_hydrophobic_polar_charged_composition() # shape: (N, 3) Secondary Structure Propensity ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_secondary_structure_propensity()`` | **Features:** 3 Average Chou-Fasman propensity values for alpha-helix, beta-sheet, and random-coil conformations across all residues. Output columns: ``Helix``, ``Sheet``, ``Coil``. .. code-block:: python ssp = desc.get_secondary_structure_propensity() # shape: (N, 3) k-mer Composition ~~~~~~~~~~~~~~~~~ **Method:** ``get_kmer_composition()`` | **Features:** 20\ :sup:`k` (default 400) Frequency of all possible k-length residue subsequences expressed as a percentage of total k-mers. Config parameter: ``kmer_composition.k`` (default 2, producing 400 features). .. code-block:: python kmer = desc.get_kmer_composition() # shape: (N, 400) with k=2 Reduced Alphabet Composition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_reduced_alphabet_composition()`` | **Features:** ``alphabet_size`` (default 6) Amino acid composition after mapping residues to a reduced set of physicochemical groups. Supported alphabet sizes: 2, 3, 4, 6. Config parameter: ``reduced_alphabet_composition.alphabet_size`` (default 6). .. code-block:: python rac = desc.get_reduced_alphabet_composition() # shape: (N, 6) Motif Composition ~~~~~~~~~~~~~~~~~ **Method:** ``get_motif_composition()`` | **Features:** number of motifs (default 8) Count of occurrences (including overlapping) of predefined biological sequence motifs, matched by regular expression. Uses 8 built-in motifs by default; a custom ``name → pattern`` dict can be supplied via ``motif_composition.motifs`` in config. .. code-block:: python motif = desc.get_motif_composition() # shape: (N, 8) Amino Acid Pair Composition ~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_amino_acid_pair_composition()`` | **Features:** 400 Frequency of all 400 residue-pair combinations, with column names annotated by the physicochemical class of each residue. .. code-block:: python pair = desc.get_amino_acid_pair_composition() # shape: (N, 400) Aliphatic Index ~~~~~~~~~~~~~~~ **Method:** ``get_aliphatic_index()`` | **Features:** 1 Relative volume occupied by aliphatic side chains (Ala, Val, Ile, Leu). Higher values are associated with greater thermostability. .. code-block:: python ai = desc.get_aliphatic_index() # shape: (N, 1) Extinction Coefficient ~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_extinction_coefficient()`` | **Features:** 2 Molar extinction coefficient at 280 nm derived from the number of Trp (W), Tyr (Y), and Cys (C) residues. Reported for both reduced and oxidised states. Output columns: ``ExtCoeff_Reduced``, ``ExtCoeff_Oxidized``. .. code-block:: python ec = desc.get_extinction_coefficient() # shape: (N, 2) Boman Index ~~~~~~~~~~~ **Method:** ``get_boman_index()`` | **Features:** 1 Sum of residue solubility values divided by sequence length. Predicts potential for protein-protein interactions. .. code-block:: python boman = desc.get_boman_index() # shape: (N, 1) Aggregation Propensity ~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_aggregation_propensity()`` | **Features:** 2 Identifies aggregation-prone regions via a sliding-window approach combining Kyte-Doolittle hydrophobicity and charge neutrality. Output columns: ``AggregProneRegions`` (count of qualifying windows) and ``AggregProneFraction`` (fraction of sequence covered). Config parameters: ``aggregation_propensity.window`` (default 5), ``.hydrophobicity_threshold`` (default 2.0), ``.charge_threshold`` (default 1). .. code-block:: python agg = desc.get_aggregation_propensity() # shape: (N, 2) Hydrophobic Moment ~~~~~~~~~~~~~~~~~~ **Method:** ``get_hydrophobic_moment()`` | **Features:** 2 Mean and maximum hydrophobic moment across sliding windows using the Eisenberg hydrophobicity scale and a helical-wheel projection, capturing amphipathicity. Output columns: ``HydrophobicMoment_Mean``, ``HydrophobicMoment_Max``. Config parameters: ``hydrophobic_moment.window`` (default 11), ``.angle`` (default 100). .. code-block:: python hm = desc.get_hydrophobic_moment() # shape: (N, 2) Shannon Entropy ~~~~~~~~~~~~~~~ **Method:** ``get_shannon_entropy()`` | **Features:** 1 An information-theoretic measure of amino acid diversity: .. math:: H = -\sum_{i=1}^{20} p_i \log_2 p_i A value of 0 indicates a completely repetitive sequence; the theoretical maximum of ~4.322 bits corresponds to a perfectly uniform distribution across all 20 amino acids. .. code-block:: python se = desc.get_shannon_entropy() # shape: (N, 1) ---- Autocorrelation Descriptors ---------------------------- Autocorrelation descriptors describe the level of correlation between two positions in a sequence separated by a lag distance $d$, in terms of a specified physicochemical property. Each of the three variants uses a different mathematical formulation. By default, 8 physicochemical properties are used with a lag of 30, generating 240 features per descriptor. **Default properties (8):** .. list-table:: :header-rows: 1 :widths: 20 80 * - AAIndex Accession - Property * - CIDH920105 - Normalised Average Hydrophobicity * - BHAR880101 - Average Flexibility Indices * - CHAM820101 - Polarizability Parameter * - CHAM820102 - Free Energy of Solution in Water (kcal/mol) * - CHOC760101 - Residue Accessible Surface Area in Tripeptide * - BIGC670101 - Residue Volume * - CHAM810101 - Steric Parameter * - DAYM780201 - Relative Mutability Config parameters common to all three descriptors: ``lag`` (default 30), ``properties`` (list of AAIndex accession numbers), ``normalize`` (bool). **Feature count formula:** ``lag × len(properties)`` → default 30 × 8 = **240**. MoreauBroto Autocorrelation ~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_moreaubroto_autocorrelation()`` | **Features:** lag × properties (default 240) Uses the raw property values of two residues separated by lag $d$: .. math:: \text{MBAuto}(d) = \sum_{i=1}^{N-d} P_i \cdot P_{i+d} Config section: ``[moreaubroto_autocorrelation]``. .. code-block:: python mb = desc.get_moreaubroto_autocorrelation() # shape: (N, 240) Moran Autocorrelation ~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_moran_autocorrelation()`` | **Features:** lag × properties (default 240) Uses normalised deviations from the mean property value: .. math:: \text{MAuto}(d) = \frac{\frac{1}{N-d}\sum_{i=1}^{N-d}(P_i - \bar{P})(P_{i+d} - \bar{P})}{\frac{1}{N}\sum_{i=1}^{N}(P_i - \bar{P})^2} Config section: ``[moran_autocorrelation]``. .. code-block:: python moran = desc.get_moran_autocorrelation() # shape: (N, 240) Geary Autocorrelation ~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_geary_autocorrelation()`` | **Features:** lag × properties (default 240) Uses squared differences between residue property values: .. math:: \text{GAuto}(d) = \frac{\frac{1}{2(N-d)}\sum_{i=1}^{N-d}(P_i - P_{i+d})^2}{\frac{1}{N-1}\sum_{i=1}^{N}(P_i - \bar{P})^2} Config section: ``[geary_autocorrelation]``. .. code-block:: python geary = desc.get_geary_autocorrelation() # shape: (N, 240) ---- CTD Descriptors --------------- CTD describes the amino acid composition within seven physicochemical property classes (hydrophobicity, volume, polarity, polarisability, charge, secondary structure, solvent accessibility). Each property divides the 20 amino acids into three classes (C1, C2, C3), from which three sub-descriptors are computed. Using all 7 properties generates **147 features** (21 per property). A subset of properties can be specified via ``ctd.property`` in config. CTD (Combined) ~~~~~~~~~~~~~~ **Method:** ``get_ctd()`` | **Features:** 147 (all 7 properties) Contains all CTD sub-descriptors concatenated: Composition + Transition + Distribution. .. code-block:: python ctd = desc.get_ctd() # shape: (N, 147) CTD Composition ~~~~~~~~~~~~~~~ **Method:** ``get_ctd_composition()`` | **Features:** 3 per property (21 total) Fraction of residues in each of the three classes (C1, C2, C3) for each property. .. code-block:: python ctd_c = desc.get_ctd_composition() # shape: (N, 21) CTD Transition ~~~~~~~~~~~~~~ **Method:** ``get_ctd_transition()`` | **Features:** 3 per property (21 total) Fraction of transitions between pairs of property classes in the sequence (C1↔C2, C1↔C3, C2↔C3). .. code-block:: python ctd_t = desc.get_ctd_transition() # shape: (N, 21) CTD Distribution ~~~~~~~~~~~~~~~~ **Method:** ``get_ctd_distribution()`` | **Features:** 15 per property (105 total) For each class, the sequence positions (as percentages of sequence length) of the 1st, 25th, 50th, 75th, and 100th occurrence of that class — capturing how each property class is distributed along the sequence. .. code-block:: python ctd_d = desc.get_ctd_distribution() # shape: (N, 105) ---- Conjoint Triad -------------- **Method:** ``get_conjoint_triad()`` | **Features:** 343 (7³) Describes the neighbourhood environment of each residue by considering triplets of adjacent residues, each residue grouped into one of 7 physicochemical classes. The frequency of each of the 7³ = 343 possible triplet combinations is computed. .. code-block:: python ct = desc.get_conjoint_triad() # shape: (N, 343) ---- Sequence Order Descriptors -------------------------- Sequence Order Coupling Number ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_sequence_order_coupling_number()`` | **Features:** ``lag`` or ``2 × lag`` Captures long-range interactions by summing the squared differences of a property between residues $d$ positions apart up to a specified lag. If a single distance matrix is given in config, ``lag`` features are produced; if no matrix is specified both the Schneider-Wrede and Grantham matrices are used, producing ``2 × lag`` features. Config section: ``[sequence_order_coupling_number]``, params: ``lag``, ``distance_matrix``. .. code-block:: python socn = desc.get_sequence_order_coupling_number() Quasi Sequence Order ~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_quasi_sequence_order()`` | **Features:** ``20 + lag`` or ``2 × (20 + lag)`` Extends amino acid composition with sequence-order correlation factors derived from pairwise residue distance matrices. Feature count: ``20 + lag`` with one distance matrix, or ``2 × (20 + lag)`` when both Schneider-Wrede and Grantham matrices are used. Config section: ``[quasi_sequence_order]``, params: ``lag``, ``distance_matrix``. .. code-block:: python qso = desc.get_quasi_sequence_order() ---- Pseudo Amino Acid Composition ------------------------------ Pseudo Amino Acid Composition (Type 1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_pseudo_amino_acid_composition()`` | **Features:** ``20 + lambda`` Augments amino acid composition (20 features) with ``lambda`` sequence-order correlation factors (correlation along the chain at lags 1 through ``lambda``), capturing both composition and sequence-order information. Config section: ``[pseudo_amino_acid_composition]``, param: ``lambda``. .. code-block:: python paac = desc.get_pseudo_amino_acid_composition() # shape: (N, 20+lambda) Amphiphilic Pseudo Amino Acid Composition (Type 2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Method:** ``get_amphiphilic_pseudo_amino_acid_composition()`` | **Features:** ``20 + 2 × lambda`` Extends PseAAC Type 1 by adding separate hydrophobicity and hydrophilicity correlation factors for each lag, producing ``20 + 2 × lambda`` features. Designed to capture amphipathic patterns. Config section: ``[amphiphilic_pseudo_amino_acid_composition]``, param: ``lambda``. .. code-block:: python apaac = desc.get_amphiphilic_pseudo_amino_acid_composition() # shape: (N, 20+(2*lambda)) ---- All Descriptors Summary ----------------------- .. list-table:: :header-rows: 1 :widths: 38 12 50 * - Descriptor - Features - Method * - Amino Acid Composition - 20 - ``get_amino_acid_composition()`` * - Dipeptide Composition - 400 - ``get_dipeptide_composition()`` * - Tripeptide Composition - 8000 - ``get_tripeptide_composition()`` * - GRAVY - 1 - ``get_gravy()`` * - Aromaticity - 1 - ``get_aromaticity()`` * - Instability Index - 1 - ``get_instability_index()`` * - Isoelectric Point - 1 - ``get_isoelectric_point()`` * - Molecular Weight - 1 - ``get_molecular_weight()`` * - Charge Distribution - 3 - ``get_charge_distribution()`` * - Hydrophobic/Polar/Charged Composition - 3 - ``get_hydrophobic_polar_charged_composition()`` * - Secondary Structure Propensity - 3 - ``get_secondary_structure_propensity()`` * - k-mer Composition - 20\ :sup:`k` (default 400) - ``get_kmer_composition()`` * - Reduced Alphabet Composition - alphabet_size (default 6) - ``get_reduced_alphabet_composition()`` * - Motif Composition - len(motifs) (default 8) - ``get_motif_composition()`` * - Amino Acid Pair Composition - 400 - ``get_amino_acid_pair_composition()`` * - Aliphatic Index - 1 - ``get_aliphatic_index()`` * - Extinction Coefficient - 2 - ``get_extinction_coefficient()`` * - Boman Index - 1 - ``get_boman_index()`` * - Aggregation Propensity - 2 - ``get_aggregation_propensity()`` * - Hydrophobic Moment - 2 - ``get_hydrophobic_moment()`` * - Shannon Entropy - 1 - ``get_shannon_entropy()`` * - MoreauBroto Autocorrelation - lag × props (default 240) - ``get_moreaubroto_autocorrelation()`` * - Moran Autocorrelation - lag × props (default 240) - ``get_moran_autocorrelation()`` * - Geary Autocorrelation - lag × props (default 240) - ``get_geary_autocorrelation()`` * - CTD - 147 - ``get_ctd()`` * - CTD Composition - 21 - ``get_ctd_composition()`` * - CTD Transition - 21 - ``get_ctd_transition()`` * - CTD Distribution - 105 - ``get_ctd_distribution()`` * - Conjoint Triad - 343 - ``get_conjoint_triad()`` * - Sequence Order Coupling Number - lag or 2×lag - ``get_sequence_order_coupling_number()`` * - Quasi Sequence Order - 20+λ or 2×(20+λ) - ``get_quasi_sequence_order()`` * - Pseudo Amino Acid Composition - 20+λ - ``get_pseudo_amino_acid_composition()`` * - Amphiphilic Pseudo Amino Acid Composition - 20+2λ - ``get_amphiphilic_pseudo_amino_acid_composition()`` ---- Utility Methods --------------- ``get_all_descriptors()`` Calculates every descriptor in sequence and returns a concatenated DataFrame of all features. Also exports to the ``descriptors_csv`` path if configured. .. code-block:: python all_desc = desc.get_all_descriptors() # shape: (N, ~10572 with defaults) ``get_descriptor_encoding(descriptor)`` Resolves a descriptor name (with fuzzy matching) and returns its feature DataFrame. Useful when the descriptor name is read from config or supplied at runtime. .. code-block:: python df = desc.get_descriptor_encoding("moran") # resolves to moran_autocorrelation ``all_descriptors_list()`` Returns the list of all 33 descriptor names. ``validate_descriptors(descriptors)`` Validates that all names in a list (or single string) are recognised descriptor names. Raises ``InvalidDescriptorError`` for any unknown names. ``get_descriptor_info(name)`` Returns a metadata dict for ``name`` including ``feature_count``, ``group``, and the associated ``get_*`` method. ``reset_descriptors()`` Clears all descriptor DataFrames back to empty state, freeing memory without re-instantiating the class. ``get_descriptor_columns(name)`` Returns the column names of the calculated DataFrame for descriptor ``name``. ---- Pre-calculated Descriptors --------------------------- For any new dataset it is recommended to calculate all descriptors once and cache them to a CSV file, which is then loaded automatically on subsequent runs: 1. Set ``all_desc: 1`` and ``descriptors_csv: "data/descriptors_.csv"`` in the ``[descriptors]`` config section. 2. Run once — all descriptor values are calculated and written to the CSV. 3. On every subsequent run, the CSV is detected and imported automatically — no recalculation required. Pre-calculated descriptor CSVs for the bundled example datasets are included in ``data/`` and ``example_datasets/``. ---- Config File ----------- All descriptor parameters are set under the ``[descriptors]`` key in the pySAR config JSON: .. code-block:: json { "descriptors": { "descriptors_csv": "data/descriptors_thermostability.csv", "all_desc": 1, "descriptor": "amino_acid_composition", "moreaubroto_autocorrelation": { "lag": 30, "properties": ["CIDH920105","BHAR880101","CHAM820101","CHAM820102", "CHOC760101","BIGC670101","CHAM810101","DAYM780201"], "normalize": 0 }, "moran_autocorrelation": { "lag": 30, "properties": ["CIDH920105","BHAR880101","CHAM820101","CHAM820102", "CHOC760101","BIGC670101","CHAM810101","DAYM780201"], "normalize": 0 }, "geary_autocorrelation": { "lag": 30, "properties": ["CIDH920105","BHAR880101","CHAM820101","CHAM820102", "CHOC760101","BIGC670101","CHAM810101","DAYM780201"], "normalize": 0 }, "ctd": { "property": ["hydrophobicity","volume","polarity","polarizability", "charge","secondaryStructure","solventAccessibility"], "all": 1 }, "conjoint_triad": {}, "sequence_order_coupling_number": { "lag": 30, "distance_matrix": "" }, "quasi_sequence_order": { "lag": 30, "distance_matrix": "" }, "pseudo_amino_acid_composition": { "lambda": 30, "weight": 0.05 }, "amphiphilic_pseudo_amino_acid_composition": { "lambda": 30, "weight": 0.05 } } } See the `CONFIG.md `_ file and the example config files for the full list of available parameters: - `thermostability.json `_ - `absorption.json `_ - `enantioselectivity.json `_ - `localization.json `_