FragmentationModel

FragmentationModel module.

All ugropy models (joback, unifac, psrk, etc) are instances of the FragmentationModule class.

class FragmentationModel(subgroups: ~pandas.core.frame.DataFrame, allow_overlapping: bool = False, allow_free_atoms: bool = False, fragmentation_result: ~ugropy.core.frag_classes.base.fragmentation_result.FragmentationResult = <class 'ugropy.core.frag_classes.base.fragmentation_result.FragmentationResult'>)[source]

Bases: object

FragmentationModel class.

All ugropy supported models are an instance of this class. This class must be inherited to create a new type of FragmentationModel.

Parameters:

subgroups (pd.DataFrame) – Model’s subgroups. Index: ‘group’ (subgroups names). Mandatory columns: ‘smarts’ (SMARTS representations of the group to detect its precense in the molecule).
allow_overlapping (bool, optional) – Weather allow overlapping or not, by default False
fragmentation_result (FragmentationResult, optional) – Fragmentation result class, by default FragmentationResult

subgroups

Model’s subgroups. Index: ‘group’ (subgroups names). Mandatory columns: ‘smarts’ (SMARTS representations of the group to detect its precense in the molecule).

Type:: pd.DataFrame

detection_mols

Dictionary cotaining all the rdkit Mol object from the detection_smarts subgroups column.

Type:: dict

get_groups(identifier: str | ~rdkit.Chem.rdchem.Mol, identifier_type: str = 'name', solver: ~ugropy.core.ilp_solvers.ilp_solver.ILPSolver = <class 'ugropy.core.ilp_solvers.default_solver.DefaultSolver'>, search_multiple_solutions: bool = False, search_nonoptimal: bool = False, solver_arguments: dict = {}, **kwargs) → FragmentationResult | List[FragmentationResult][source]

Get the groups of a molecule.

Parameters:

identifier (Union[str, Chem.rdchem.Mol]) – Identifier of the molecule. You can use either the name of the molecule, the SMILEs of the molecule or a rdkit Mol object.
identifier_type (str, optional) – Identifier type of the molecule. Use “name” if you are providing the molecules’ name, “smiles” if you are providing the SMILES or “mol” if you are providing a rdkir mol object, by default “name”
solver (ILPSolver, optional) – ILP solver class, by default DefaultSolver
search_multiple_solutions (bool, optional) – Weather search for multiple solutions or not, by default False If False the return will be a FragmentationResult object, if True the return will be a list of FragmentationResult objects.
search_nonoptimal (bool, optional) – If True, the solver will search for non-optimal solutions along with the optimal ones. This is useful when the user wants to find all possible combinations of fragments that cover the universe. By default False. If search_multiple_solutions is False, this parameter will be ignored.
solver_arguments (dict, optional) – Dictionary with the arguments to be passed to the solver. For the DefaultSolver of ugropy you can change de PulP solver passing a dictionary like {“solver”: “PULP_CBC_CMD”} and change the PulP solver. If empty it will use the default solver arguments, by default {}.

Returns:

Fragmentation result. If search_multiple_solutions is False the return will be a FragmentationResult object, if True the return will be a list of FragmentationResult objects.

Return type:

Union[FragmentationResult, List[FragmentationResult]]

mol_preprocess(mol: Mol) → Mol[source]

Preprocess the molecule.

This method is called before the detection of the fragments. It can be used to preprocess the molecule before the detection of the fragments. This allow to use RDKit functions to preprocess the molecule and make your SMARTS detection easier. The default implementation does nothing and leave the mol object as it is. You can inherit this class and override this method to preprocess the molecule.

Parameters:: mol (Chem.rdchem.Mol) – Molecule to preprocess.
Returns:: Preprocessed molecule.
Return type:: Chem.rdchem.Mol

set_fragmentation_result(molecule: Mol, solutions_fragments: List[dict], search_multiple_solutions: bool = False, **kwargs) → FragmentationResult | List[FragmentationResult][source]

Process the solutions and return the FragmentationResult objects.

Parameters:

molecule (Chem.rdchem.Mol) – Rdkit mol object.
solutions_fragments (List[dict]) – Fragments detected in the molecule.
search_multiple_solutions (bool, optional) – Weather search for multiple solutions or not, by default False

Returns:

List of FragmentationResult objects.

Return type:

Union[FragmentationResult, List[FragmentationResult]]

detect_fragments(molecule: Mol) → dict[source]

Detect all the fragments in the molecule.

Return a dictionary with the detected fragments as keys and a tuple with the atoms indexes of the fragment as values. For example, n-hexane for the UNIFAC model will return:

{
    'CH3_0': (0,),
    'CH3_1': (5,),
    'CH2_0': (1,),
    'CH2_1': (2,),
    'CH2_2': (3,),
    'CH2_3': (4,)
}

You may note that multiple occurrence of a fragment name will be indexed. The convention is always: <fragment_name>_i where i is the index of the occurrence.

Parameters:: mol (Chem.rdchem.Mol) – Molecule to detect the fragments.
Returns:: Detected fragments in the molecule.
Return type:: dict