Working with ugropy
Failing
ugropy
may fail to obtain the subgroups of a molecule for a certain model for two reasons: either there is a bug in the code, or the molecule cannot be represented by the subgroups of the failing model.
ugropy
utilizes SMARTS for the representation of functional groups to inquire whether the molecule contains those structures. Let’s examine the functional group list for the classic liquid-vapor UNIFAC model.
[1]:
try:
import google.colab
%pip install -q ugropy
except ImportError:
pass
[2]:
from ugropy import unifac
unifac.subgroups
[2]:
smarts | molecular_weight | |
---|---|---|
group | ||
CH3 | [CX4H3] | 15.03500 |
CH2 | [CX4H2] | 14.02700 |
CH | [CX4H] | 13.01900 |
C | [CX4H0] | 12.01100 |
CH2=CH | [CH2]=[CH] | 27.04600 |
... | ... | ... |
NCO | [NX2H0]=[CX2H0]=[OX1H0] | 42.01700 |
(CH2)2SU | [CH2]S(=O)(=O)[CH2] | 92.11620 |
CH2CHSU | [CH2]S(=O)(=O)[CH] | 91.10840 |
IMIDAZOL | [c]1:[c]:[n]:[c]:[n]:1 | 68.07820 |
BTI | C(F)(F)(F)S(=O)(=O)[N-]S(=O)(=O)C(F)(F)F | 279.91784 |
113 rows × 2 columns
For example, let’s check the SMARTS representation of the alcohol group ACOH:
[3]:
unifac.subgroups.loc["ACOH", "smarts"]
[3]:
'[cH0][OH]'
The SMARTS representation it’s telling us that the OH group it’s, of course, a hydroxyl group bounded by a single bound to an aromatic carbon atom.
An example of a molecule that cannot be represented by UNIFAC groups:
[4]:
from rdkit.Chem import Draw
mol = unifac.get_groups("C1(=CC=CC=C1)OC(C)(C)C", "smiles")
Draw.MolToImage(mol.molecule)
[4]:

[5]:
print(mol.subgroups)
{}
The library “fails” to obtain any functional groups to accurately represent the molecule. This failure is represented by an empty dictionary. In this case, the “fail” is correct, but it could fail due to errors in the groups SMARTS representations or the algorithm, resulting in an empty dictionary as well. Currently, the supported models are tested against 444 different molecules.
If you encounter a failing representation, you can examine the structure of the molecule and the list of functional groups of the failing model. If you determine that the molecule can indeed be modeled, you may have discovered a bug. Feel free to report the issue on the repository along with the failing molecule’s SMILES/name, the failing model and the ugropy
version.
More than one solution
Models like UNIFAC or PSRK can have multiple solutions to represent a molecule, and ugropy tries its best to find them all. In such cases, you will receive a list of dictionaries, each containing one of the solutions found. Let’s take a look.
[6]:
from ugropy import unifac
from rdkit.Chem import Draw
mol = unifac.get_groups("CCCC1=CC=C(CC(=O)OC)C=C1", "smiles", search_multiple_solutions=True)
Draw.MolToImage(mol[0].molecule, highlightAtoms=[7])
[6]:

This molecule can be modeled in two ways depending on how we treat the CH2 attached to the ring and the ester carbon (highlighted in red). We can either form an ACCH2 group and model the ester group with COO, or we can use an AC group and model the ester group with CH2COO.
[7]:
print("Solution 1:")
print(mol[0].subgroups)
print("solution 2:")
print(mol[1].subgroups)
Solution 1:
{'CH3': 2, 'CH2': 1, 'ACH': 4, 'ACCH2': 2, 'COO': 1}
solution 2:
{'CH3': 2, 'CH2': 1, 'ACH': 4, 'AC': 1, 'ACCH2': 1, 'CH2COO': 1}
[8]:
mol[0].draw(width=800)
[8]:
[9]:
mol[1].draw(width=800)
[9]:
This could be useful in cases where some groups have more interaction parameters than others in the mixture that you want to model with UNIFAC. Alternatively, you can try both approaches and compare if there are any differences.
More solutions (non-optimal)
ugropy
allows you to search multiple solutions for the models, but all these solutions are optimal (uses the same number of groups to represent the molecule). It’s possible to search for non-optimal solutions along with the optimal ones. This is useful when the user wants to find all possible combinations of fragments that cover the molecule. This feature is intended for research purposes and is not recommended to actually model the molecules.
We can do this with the search_nonoptimal
parameter:
[10]:
from ugropy import unifac
solutions = unifac.get_groups(
"9,10-dihydroanthracene",
search_multiple_solutions=True, # If this is False, search_nonoptimal is ignored
search_nonoptimal=True,
)
solutions
[10]:
[<ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f95d221cd60>,
<ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f95d221cc30>,
<ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f9591f50950>]
We can check all the obtained solutions:
[11]:
solutions[0].draw(width=600)
[11]:
[12]:
solutions[1].draw(width=600)
[12]:
[13]:
solutions[2].draw(width=600)
[13]:
If you are an experienced user of UNIFAC-like models, you may notice that the only “correct” solutions is the first one. All ugropy
models have the capacity to also search for non-optimal solutions.