Failing¶
ugropy
may fail to obtain the subgroups of a molecule for a certain model for two reasons: either there is a bug in the code, or the molecule cannot be represented by the subgroups of the failing model.
ugropy
utilizes SMARTS for the representation of functional groups to inquire whether the molecule contains those structures. Let’s examine the functional group list for the classic liquid-vapor UNIFAC model.
[1]:
from ugropy import unifac
unifac.subgroups
[1]:
detection_smarts | smarts | contribute | composed | molecular_weight | |
---|---|---|---|---|---|
group | |||||
CH3 | [CX4H3] | NaN | {"CH3": 1} | n | 15.03500 |
CH2 | [CX4H2] | NaN | {"CH2": 1} | n | 14.02700 |
CH | [CX4H] | NaN | {"CH": 1} | n | 13.01900 |
C | [CX4H0] | NaN | {"C": 1} | n | 12.01100 |
CH2=CH | [CH2]=[CH] | NaN | {"CH2=CH": 1} | n | 27.04600 |
... | ... | ... | ... | ... | ... |
NCO | [NX2H0]=[CX2H0]=[OX1H0] | NaN | {"NCO": 1} | n | 42.01700 |
(CH2)2SU | [CH2]S(=O)(=O)[CH2] | NaN | {"(CH2)2SU": 1, "CH2": -1, "CH2S": -1} | n | 92.11620 |
CH2CHSU | [CH2]S(=O)(=O)[CH] | NaN | {"CH2CHSU": 1, "CH": -1, "CH2S": -1} | n | 91.10840 |
IMIDAZOL | [c]1:[c]:[n]:[c]:[n]:1 | NaN | {"IMIDAZOL": 1, "ACH": -3} | n | 68.07820 |
BTI | C(F)(F)(F)S(=O)(=O)[N-]S(=O)(=O)C(F)(F)F | NaN | {"BTI": 1, "CF3": -2} | n | 279.91784 |
113 rows × 5 columns
For example, let’s check the SMARTS representation of the alcohol group ACOH:
[2]:
unifac.subgroups.loc["ACOH", "detection_smarts"]
[2]:
'[cH0][OH]'
The SMARTS representation it’s telling us that the OH group it’s, of course, a hydroxyl group bounded by a single bound to an aromatic carbon atom.
An example of a molecule that cannot be represented by UNIFAC groups:
[3]:
from ugropy import get_groups
from rdkit.Chem import Draw
mol = get_groups(unifac, "C1(=CC=CC=C1)OC(C)(C)C", "smiles")
Draw.MolToImage(mol.mol_object)
[3]:
[4]:
print(mol.subgroups)
{}
The library “fails” to obtain any functional groups to accurately represent the molecule. This failure is represented by an empty dictionary. In this case, the “fail” is correct, but it could fail due to errors in the groups SMARTS representations or the algorithm, resulting in an empty dictionary as well. Currently, the supported models are tested against the following numbers of molecules:
Classic liquid-vapor UNIFAC: 408
Predictive Soave-Redlich-Kwong (PSRK): 442
Joback: 285
If you encounter a failing representation, you can examine the structure of the molecule and the list of functional groups of the failing model. If you determine that the molecule can indeed be modeled, you may have discovered a bug. Feel free to report the issue on the repository along with the failing molecule’s SMILES/name, the failing model and the ugropy
version.
More than one solution¶
Models like UNIFAC or PSRK can have multiple solutions to represent a molecule, and ugropy tries its best to find them all. In such cases, you will receive a list of dictionaries, each containing one of the solutions found. Let’s take a look.
[5]:
from ugropy import Groups
from rdkit.Chem import Draw
mol = Groups("CCCC1=CC=C(CC(=O)OC)C=C1", "smiles")
Draw.MolToImage(mol.mol_object, highlightAtoms=[7])
[5]:
This molecule can be modeled in two ways depending on how we treat the CH2 attached to the ring and the ester carbon (highlighted in red). We can either form an ACCH2 group and model the ester group with COO, or we can use an AC group and model the ester group with CH2COO.
[6]:
print("UNIFAC:")
print(mol.unifac.subgroups)
print("PSRK:")
print(mol.psrk.subgroups)
UNIFAC:
[{'CH3': 2, 'ACH': 4, 'ACCH2': 1, 'CH2COO': 1, 'CH2': 1, 'AC': 1}
{'CH3': 2, 'ACH': 4, 'ACCH2': 2, 'CH2': 1, 'COO': 1}]
PSRK:
[{'CH3': 2, 'ACH': 4, 'ACCH2': 1, 'CH2COO': 1, 'CH2': 1, 'AC': 1}
{'CH3': 2, 'ACH': 4, 'ACCH2': 2, 'CH2': 1, 'COO': 1}]
[7]:
svg1, svg2 = mol.unifac.draw(width=800)
[8]:
from IPython.display import SVG
SVG(svg1)
[8]:
[9]:
SVG(svg2)
[9]:
This could be useful in cases where some groups have more interaction parameters than others in the mixture that you want to model with UNIFAC. Alternatively, you can try both approaches and compare if there are any differences.