FragmentationModel

FragmentationModel module.

All ugropy models (joback, unifac, psrk, etc) are instances of the FragmentationModule class.

class FragmentationModel(subgroups: ~pandas.core.frame.DataFrame, allow_overlapping: bool = False, allow_free_atoms: bool = False, fragmentation_result: ~ugropy.core.frag_classes.base.fragmentation_result.FragmentationResult = <class 'ugropy.core.frag_classes.base.fragmentation_result.FragmentationResult'>)[source]

Bases: object

FragmentationModel class.

All ugropy supported models are an instance of this class. This class must be inherited to create a new type of FragmentationModel.

Parameters:
  • subgroups (pd.DataFrame) – Model’s subgroups. Index: ‘group’ (subgroups names). Mandatory columns: ‘smarts’ (SMARTS representations of the group to detect its precense in the molecule).

  • allow_overlapping (bool, optional) – Weather allow overlapping or not, by default False

  • fragmentation_result (FragmentationResult, optional) – Fragmentation result class, by default FragmentationResult

subgroups

Model’s subgroups. Index: ‘group’ (subgroups names). Mandatory columns: ‘smarts’ (SMARTS representations of the group to detect its precense in the molecule).

Type:

pd.DataFrame

detection_mols

Dictionary cotaining all the rdkit Mol object from the detection_smarts subgroups column.

Type:

dict

get_groups(identifier: str | ~rdkit.Chem.rdchem.Mol, identifier_type: str = 'name', solver: ~ugropy.core.ilp_solvers.ilp_solver.ILPSolver = <class 'ugropy.core.ilp_solvers.default_solver.DefaultSolver'>, search_multiple_solutions: bool = False, **kwargs) FragmentationResult | List[FragmentationResult][source]

Get the groups of a molecule.

Parameters:
  • identifier (Union[str, Chem.rdchem.Mol]) – Identifier of the molecule. You can use either the name of the molecule, the SMILEs of the molecule or a rdkit Mol object.

  • identifier_type (str, optional) – Identifier type of the molecule. Use “name” if you are providing the molecules’ name, “smiles” if you are providing the SMILES or “mol” if you are providing a rdkir mol object, by default “name”

  • solver (ILPSolver, optional) – ILP solver class, by default DefaultSolver

  • search_multiple_solutions (bool, optional) – Weather search for multiple solutions or not, by default False If False the return will be a FragmentationResult object, if True the return will be a list of FragmentationResult objects.

Returns:

Fragmentation result. If search_multiple_solutions is False the return will be a FragmentationResult object, if True the return will be a list of FragmentationResult objects.

Return type:

Union[FragmentationResult, List[FragmentationResult]]

mol_preprocess(mol: Mol) Mol[source]

Preprocess the molecule.

This method is called before the detection of the fragments. It can be used to preprocess the molecule before the detection of the fragments. This allow to use RDKit functions to preprocess the molecule and make your SMARTS detection easier. The default implementation does nothing and leave the mol object as it is. You can inherit this class and override this method to preprocess the molecule.

Parameters:

mol (Chem.rdchem.Mol) – Molecule to preprocess.

Returns:

Preprocessed molecule.

Return type:

Chem.rdchem.Mol

set_fragmentation_result(molecule: Mol, solutions_fragments: List[dict], search_multiple_solutions: bool = False, **kwargs) FragmentationResult | List[FragmentationResult][source]

Process the solutions and return the FragmentationResult objects.

Parameters:
  • molecule (Chem.rdchem.Mol) – Rdkit mol object.

  • solutions_fragments (List[dict]) – Fragments detected in the molecule.

  • search_multiple_solutions (bool, optional) – Weather search for multiple solutions or not, by default False

Returns:

List of FragmentationResult objects.

Return type:

Union[FragmentationResult, List[FragmentationResult]]

detect_fragments(molecule: Mol) dict[source]

Detect all the fragments in the molecule.

Return a dictionary with the detected fragments as keys and a tuple with the atoms indexes of the fragment as values. For example, n-hexane for the UNIFAC model will return:

{
    'CH3_0': (0,),
    'CH3_1': (5,),
    'CH2_0': (1,),
    'CH2_1': (2,),
    'CH2_2': (3,),
    'CH2_3': (4,)
}

You may note that multiple occurrence of a fragment name will be indexed. The convention is always: <fragment_name>_i where i is the index of the occurrence.

Parameters:

mol (Chem.rdchem.Mol) – Molecule to detect the fragments.

Returns:

Detected fragments in the molecule.

Return type:

dict