The easy way
[1]:
try:
import google.colab
IN_COLAB = True
except:
IN_COLAB = False
if IN_COLAB:
%pip install ugropy
The Groups class
ugropy
is relatively straightforward to use, but let’s explore what it has to offer. Now, let’s start with the easy methods…
We’ll utilize the Groups class to retrieve the subgroups of all the models supported by ugropy
.
[2]:
from ugropy import Groups
carvone = Groups("carvone")
carvone.unifac.subgroups
[2]:
{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}
Well, that was easy… ugropy
utilizes PubChemPy
(link) to access PubChem
and retrieve the SMILES representation of the molecule. ugropy
then employs the SMILES representation along with the rdkit
(link) library to identify the functional groups of the molecules.
The complete signature of the Groups class is as follows:
[3]:
from ugropy import DefaultSolver
carvone = Groups(
identifier="carvone",
identifier_type="name",
solver=DefaultSolver,
search_multiple_solutions=False,
normal_boiling_temperature=None
)
The identifier_type argument (default: “name”) can be set to “name”, “smiles” or “mol”.
When “name” is set, ugropy
will use the identifier argument to search in pubchem for the canonical SMILES of the molecule.
When “smiles” is set, ugropy
uses it directly, this also means that the library will not suffer the overhead of searching on pubchem. Try it yourself:
[4]:
carvone = Groups(
identifier="CC1=CCC(CC1=O)C(=C)C",
identifier_type="smiles",
)
carvone.unifac.subgroups
[4]:
{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}
If you are familiar with the rdkit
library, you’ll know that there are numerous ways to define a molecule (e.g., SMILES, SMARTS, PDB file, InChIKey, etc.). ugropy
supports the provision of a Mol object from the rdkit
library.
[5]:
from rdkit import Chem
mol_obj = Chem.MolFromInchi("InChI=1S/C10H14O/c1-7(2)9-5-4-8(3)10(11)6-9/h4,9H,1,5-6H2,2-3H3")
carvone = Groups(
identifier=mol_obj,
identifier_type="mol",
normal_boiling_temperature=None
)
carvone.unifac.subgroups
[5]:
{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}
The current supported models are the classic liquid-vapor UNIFAC, Predictive Soave-Redlich-Kwong (PSRK), Joback and Abdulelah-Gani. You can access the functional groups this way:
[6]:
carvone = Groups("carvone")
print(carvone.unifac.subgroups)
print(carvone.psrk.subgroups)
print(carvone.joback.subgroups)
print(carvone.agani.primary.subgroups)
{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}
{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}
{'-CH3': 2, '=CH2': 1, '=C<': 1, 'ring-CH2-': 2, 'ring>CH-': 1, 'ring=CH-': 1, 'ring=C<': 1, '>C=O (ring)': 1}
{'CH3': 2, 'CH2=C': 1, 'CH2 (cyclic)': 2, 'CH (cyclic)': 1, 'CH=C (cyclic)': 1, 'CO (cyclic)': 1}
You can obtain more information about the molecule from each model. For example, UNIFAC and PSRK are Excess Gibbs Models, soy you can obtain the estimation of the R and Q values of the molecule (molecule’s reduced VdW volume and area)
[7]:
print("UNIFAC R: ", carvone.unifac.r)
print("UNIFAC Q: ", carvone.unifac.q)
print("PSRK R: ", carvone.psrk.r)
print("PSRK Q: ", carvone.psrk.q)
UNIFAC R: 6.3751
UNIFAC Q: 5.308
PSRK R: 6.3751
PSRK Q: 5.308
On the Joback model, you can obtain the estimation of different properties. We will discuss the Properties estimators later.
[8]:
print(carvone.joback.acentric_factor)
print(carvone.joback.normal_boiling_point)
print(carvone.joback.critical_temperature)
print(carvone.joback.critical_pressure)
print(carvone.joback.critical_volume)
print(carvone.joback.vapor_pressure(430))
0.42452945182153057 dimensionless
516.47 kelvin
742.5207962108279 kelvin
28.596757127741714 bar
503.5 centimeter ** 3 / mole
0.09232883692318564 bar
The normal_boiling_temperature parameter is provided, it is used in the Joback properties calculations instead of the Joback-estimated normal boiling temperature (refer to the Joback tutorial).
Finally the search_multiple_solutions parameter is used to determine if the solver should return multiple solutions or not. If set to True, the solver will return multiple solutions if they exist. If set to False, the solver will return only one solution. The default value is False.
[9]:
# Example of multiple solutions
molecule = Groups("CCCC1=CC=C(CC(=O)OC)C=C1", "smiles", search_multiple_solutions=True)
molecule.unifac
[9]:
[<ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7fbd7cb70f50>,
<ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7fbd7cb71130>]
As you can see we obtained a list of GibbsFragmentationResult objects. The result always will be a list when the search_multiple_solutions parameter is set to True independently of the number of solutions found.
We can check both solutions:
[10]:
print(molecule.unifac[0].subgroups)
print(molecule.unifac[1].subgroups)
{'CH3': 2, 'CH2': 1, 'ACH': 4, 'ACCH2': 2, 'COO': 1}
{'CH3': 2, 'CH2': 1, 'ACH': 4, 'AC': 1, 'ACCH2': 1, 'CH2COO': 1}
Multiple solution searching is intended to get all the solution for a given model and try different representations of the molecule to obtain properties. For example, different UNIFAC representations could lead to different liquid-liquid or liquid-vapor equilibrium predictions.
The full documentation of the Groups
class may be accessed in the API documentation. Or you can do…
[11]:
?Groups
Also, you can visualize your fragmentations results. Let’s see the multiple solutions obtained before:
[12]:
molecule.unifac[0].draw(width=800)
[12]:
[13]:
molecule.unifac[1].draw(width=800)
[13]:
Let’s also draw the carvone solutions obtained before:
[14]:
carvone.unifac.draw(width=600)
[14]:
[15]:
carvone.psrk.draw(width=600)
[15]:
[16]:
carvone.joback.draw(width=600)
[16]:
[17]:
carvone.agani.primary.draw(width=600)
[17]:
You can save the figure by doing:
[18]:
with open("figure.svg", "w") as f:
f.write(carvone.unifac.get_solution_svg(width=600))
Check the full documentation of the draw funcion:
[19]:
?carvone.unifac.draw
Finally, lets draw the ugropy
logo:
[20]:
mol = Groups("CCCC1=C(COC(C)(C)COC(=O)OCC)C=C(CC2=CC=CC=C2)C=C1", "smiles")
mol.unifac.draw(
title="ugropy",
width=900,
height=450,
title_font_size=50,
legend_font_size=14
)
[20]:
WARNING
For the UNIFAC, and PSRK groups the aldehyde group name is changed to HCO according to the discussion: https://github.com/ClapeyronThermo/Clapeyron.jl/issues/225
This is more consistent with the ether groups and formate group.