The easy way

[1]:

try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False


if IN_COLAB:
  %pip install ugropy

The Groups class

ugropy is relatively straightforward to use, but let’s explore what it has to offer. Now, let’s start with the easy methods…

We’ll utilize the Groups class to retrieve the subgroups of all the models supported by ugropy.

[2]:

from ugropy import Groups

carvone = Groups("carvone")

carvone.unifac.subgroups

[2]:

{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}

Well, that was easy… ugropy utilizes PubChemPy (link) to access PubChem and retrieve the SMILES representation of the molecule. ugropy then employs the SMILES representation along with the rdkit (link) library to identify the functional groups of the molecules.

The complete signature of the Groups class is as follows:

[3]:

from ugropy import DefaultSolver

carvone = Groups(
    identifier="carvone",
    identifier_type="name",
    solver=DefaultSolver,
    search_multiple_solutions=False,
    normal_boiling_temperature=None
)

The identifier_type argument (default: “name”) can be set to “name”, “smiles” or “mol”.

When “name” is set, ugropy will use the identifier argument to search in pubchem for the canonical SMILES of the molecule.

When “smiles” is set, ugropy uses it directly, this also means that the library will not suffer the overhead of searching on pubchem. Try it yourself:

[4]:

carvone = Groups(
    identifier="CC1=CCC(CC1=O)C(=C)C",
    identifier_type="smiles",
)

carvone.unifac.subgroups

[4]:

{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}

If you are familiar with the rdkit library, you’ll know that there are numerous ways to define a molecule (e.g., SMILES, SMARTS, PDB file, InChIKey, etc.). ugropy supports the provision of a Mol object from the rdkit library.

[5]:

from rdkit import Chem

mol_obj = Chem.MolFromInchi("InChI=1S/C10H14O/c1-7(2)9-5-4-8(3)10(11)6-9/h4,9H,1,5-6H2,2-3H3")

carvone = Groups(
    identifier=mol_obj,
    identifier_type="mol",
    normal_boiling_temperature=None
)

carvone.unifac.subgroups

[5]:

{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}

The current supported models are the classic liquid-vapor UNIFAC, Predictive Soave-Redlich-Kwong (PSRK), Joback and Abdulelah-Gani. You can access the functional groups this way:

[6]:

carvone = Groups("carvone")

print(carvone.unifac.subgroups)

print(carvone.psrk.subgroups)

print(carvone.joback.subgroups)

print(carvone.agani.primary.subgroups)

{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}
{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}
{'-CH3': 2, '=CH2': 1, '=C<': 1, 'ring-CH2-': 2, 'ring>CH-': 1, 'ring=CH-': 1, 'ring=C<': 1, '>C=O (ring)': 1}
{'CH3': 2, 'CH2=C': 1, 'CH2 (cyclic)': 2, 'CH (cyclic)': 1, 'CH=C (cyclic)': 1, 'CO (cyclic)': 1}

You can obtain more information about the molecule from each model. For example, UNIFAC and PSRK are Excess Gibbs Models, soy you can obtain the estimation of the R and Q values of the molecule (molecule’s reduced VdW volume and area)

[7]:

print("UNIFAC R: ", carvone.unifac.r)
print("UNIFAC Q: ", carvone.unifac.q)

print("PSRK R: ", carvone.psrk.r)
print("PSRK Q: ", carvone.psrk.q)

UNIFAC R:  6.3751
UNIFAC Q:  5.308
PSRK R:  6.3751
PSRK Q:  5.308

On the Joback model, you can obtain the estimation of different properties. We will discuss the Properties estimators later.

[8]:

print(carvone.joback.acentric_factor)
print(carvone.joback.normal_boiling_point)
print(carvone.joback.critical_temperature)
print(carvone.joback.critical_pressure)
print(carvone.joback.critical_volume)
print(carvone.joback.vapor_pressure(430))

0.42452945182153057 dimensionless
516.47 kelvin
742.5207962108279 kelvin
28.596757127741714 bar
503.5 centimeter ** 3 / mole
0.09232883692318564 bar

The normal_boiling_temperature parameter is provided, it is used in the Joback properties calculations instead of the Joback-estimated normal boiling temperature (refer to the Joback tutorial).

Finally the search_multiple_solutions parameter is used to determine if the solver should return multiple solutions or not. If set to True, the solver will return multiple solutions if they exist. If set to False, the solver will return only one solution. The default value is False.

[9]:

# Example of multiple solutions

molecule = Groups("CCCC1=CC=C(CC(=O)OC)C=C1", "smiles", search_multiple_solutions=True)

molecule.unifac

[9]:

[<ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f966310e990>,
 <ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f966310ea50>]

As you can see we obtained a list of GibbsFragmentationResult objects. The result always will be a list when the search_multiple_solutions parameter is set to True independently of the number of solutions found.

We can check both solutions:

[10]:

print(molecule.unifac[0].subgroups)
print(molecule.unifac[1].subgroups)

{'CH3': 2, 'CH2': 1, 'ACH': 4, 'ACCH2': 2, 'COO': 1}
{'CH3': 2, 'CH2': 1, 'ACH': 4, 'AC': 1, 'ACCH2': 1, 'CH2COO': 1}

Multiple solution searching is intended to get all the solution for a given model and try different representations of the molecule to obtain properties. For example, different UNIFAC representations could lead to different liquid-liquid or liquid-vapor equilibrium predictions.

The full documentation of the Groups class may be accessed in the API documentation. Or you can do…

[11]:

?Groups

Also, you can visualize your fragmentations results. Let’s see the multiple solutions obtained before:

[12]:

molecule.unifac[0].draw(width=800)

[12]:

[13]:

molecule.unifac[1].draw(width=800)

[13]:

Let’s also draw the carvone solutions obtained before:

[14]:

carvone.unifac.draw(width=600)

[14]:

[15]:

carvone.psrk.draw(width=600)

[15]:

[16]:

carvone.joback.draw(width=600)

[16]:

[17]:

carvone.agani.primary.draw(width=600)

[17]:

You can save the figure by doing:

[18]:

with open("figure.svg", "w") as f:
    f.write(carvone.unifac.get_solution_svg(width=600))

Check the full documentation of the draw funcion:

[19]:

?carvone.unifac.draw

Finally, lets draw the ugropy logo:

[20]:

mol = Groups("CCCC1=C(COC(C)(C)COC(=O)OCC)C=C(CC2=CC=CC=C2)C=C1", "smiles")

mol.unifac.draw(
    title="ugropy",
    width=900,
    height=450,
    title_font_size=50,
    legend_font_size=14
)

[20]:

WARNING

For the UNIFAC, and PSRK groups the aldehyde group name is changed to HCO according to the discussion: https://github.com/ClapeyronThermo/Clapeyron.jl/issues/225

This is more consistent with the ether groups and formate group.