Failing

ugropy may fail to obtain the subgroups of a molecule for a certain model for two reasons: either there is a bug in the code, or the molecule cannot be represented by the subgroups of the failing model.

ugropy utilizes SMARTS for the representation of functional groups to inquire whether the molecule contains those structures. Let’s examine the functional group list for the classic liquid-vapor UNIFAC model.

[1]:
from ugropy import unifac

unifac.subgroups
[1]:
detection_smarts smarts contribute composed molecular_weight
group
CH3 [CX4H3] NaN {"CH3": 1} n 15.03500
CH2 [CX4H2] NaN {"CH2": 1} n 14.02700
CH [CX4H] NaN {"CH": 1} n 13.01900
C [CX4H0] NaN {"C": 1} n 12.01100
CH2=CH [CH2]=[CH] NaN {"CH2=CH": 1} n 27.04600
... ... ... ... ... ...
NCO [NX2H0]=[CX2H0]=[OX1H0] NaN {"NCO": 1} n 42.01700
(CH2)2SU [CH2]S(=O)(=O)[CH2] NaN {"(CH2)2SU": 1, "CH2": -1, "CH2S": -1} n 92.11620
CH2CHSU [CH2]S(=O)(=O)[CH] NaN {"CH2CHSU": 1, "CH": -1, "CH2S": -1} n 91.10840
IMIDAZOL [c]1:[c]:[n]:[c]:[n]:1 NaN {"IMIDAZOL": 1, "ACH": -3} n 68.07820
BTI C(F)(F)(F)S(=O)(=O)[N-]S(=O)(=O)C(F)(F)F NaN {"BTI": 1, "CF3": -2} n 279.91784

113 rows × 5 columns

For example, let’s check the SMARTS representation of the alcohol group ACOH:

[2]:
unifac.subgroups.loc["ACOH", "detection_smarts"]
[2]:
'[cH0][OH]'

The SMARTS representation it’s telling us that the OH group it’s, of course, a hydroxyl group bounded by a single bound to an aromatic carbon atom.

An example of a molecule that cannot be represented by UNIFAC groups:

[3]:
from ugropy import get_groups
from rdkit.Chem import Draw

mol = get_groups(unifac, "C1(=CC=CC=C1)OC(C)(C)C", "smiles")

Draw.MolToImage(mol.mol_object)
[3]:
../_images/tutorial_ugropy_failing_5_0.png
[4]:
print(mol.subgroups)
{}

The library “fails” to obtain any functional groups to accurately represent the molecule. This failure is represented by an empty dictionary. In this case, the “fail” is correct, but it could fail due to errors in the groups SMARTS representations or the algorithm, resulting in an empty dictionary as well. Currently, the supported models are tested against the following numbers of molecules:

  • Classic liquid-vapor UNIFAC: 408

  • Predictive Soave-Redlich-Kwong (PSRK): 442

  • Joback: 285

If you encounter a failing representation, you can examine the structure of the molecule and the list of functional groups of the failing model. If you determine that the molecule can indeed be modeled, you may have discovered a bug. Feel free to report the issue on the repository along with the failing molecule’s SMILES/name, the failing model and the ugropy version.

More than one solution

Models like UNIFAC or PSRK can have multiple solutions to represent a molecule, and ugropy tries its best to find them all. In such cases, you will receive a list of dictionaries, each containing one of the solutions found. Let’s take a look.

[5]:
from ugropy import Groups
from rdkit.Chem import Draw


mol = Groups("CCCC1=CC=C(CC(=O)OC)C=C1", "smiles")

Draw.MolToImage(mol.mol_object, highlightAtoms=[7])
[5]:
../_images/tutorial_ugropy_failing_8_0.png

This molecule can be modeled in two ways depending on how we treat the CH2 attached to the ring and the ester carbon (highlighted in red). We can either form an ACCH2 group and model the ester group with COO, or we can use an AC group and model the ester group with CH2COO.

[6]:
print("UNIFAC:")
print(mol.unifac.subgroups)
print("PSRK:")
print(mol.psrk.subgroups)
UNIFAC:
[{'CH3': 2, 'ACH': 4, 'ACCH2': 1, 'CH2COO': 1, 'CH2': 1, 'AC': 1}
 {'CH3': 2, 'ACH': 4, 'ACCH2': 2, 'CH2': 1, 'COO': 1}]
PSRK:
[{'CH3': 2, 'ACH': 4, 'ACCH2': 1, 'CH2COO': 1, 'CH2': 1, 'AC': 1}
 {'CH3': 2, 'ACH': 4, 'ACCH2': 2, 'CH2': 1, 'COO': 1}]
[7]:
svg1, svg2 = mol.unifac.draw(width=800)
[8]:
from IPython.display import SVG

SVG(svg1)
[8]:
../_images/tutorial_ugropy_failing_12_0.svg
[9]:
SVG(svg2)
[9]:
../_images/tutorial_ugropy_failing_13_0.svg

This could be useful in cases where some groups have more interaction parameters than others in the mixture that you want to model with UNIFAC. Alternatively, you can try both approaches and compare if there are any differences.