Properties estimators

Open In Colab

Some group contribution models could be used to predict properties of pure substances.

Joback model

The Joback model is a well known model to estimate properties of pure substances. We can use it independently of the Groups class by doing:

[1]:
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False


if IN_COLAB:
  %pip install ugropy
[2]:
from ugropy import joback

toluene = joback.get_groups("toluene")
[3]:
toluene.draw(width=400)
[3]:
../_images/tutorial_properties_estimators_3_0.svg
[4]:
print(toluene.critical_pressure)
print(toluene.critical_volume)
print(toluene.critical_temperature)
print(toluene.acentric_factor)
print(toluene.fusion_temperature)
41.144119209325225 bar
319.5 centimeter ** 3 / mole
598.0611700010388 kelvin
0.25208296412216535 dimensionless
195.07 kelvin

You may notice that the attributes of the Joback class are numbers with units. This is possible thanks to the pint library. The pint library allows us to work with units in a very simple way. For example, we can convert the value of the critical_pressure attribute to pascal by doing:

[5]:
toluene.critical_pressure.to("Pa")
[5]:
4114411.9209325225 pascal

If you want to obtain the value of the attribute without units, you can use the magnitude attribute:

[6]:
toluene.critical_pressure.magnitude
[6]:
np.float64(41.144119209325225)

Or combine all that we know:

[7]:
toluene.critical_pressure.to("mmHg").magnitude
[7]:
np.float64(30860.62289092802)

For more information about the pint library, you can visit their documentation.

On the other hand, we can use the Joback class to estimate temperature-dependent properties of the molecule (also with pint units). In the API documentation you can check the available methods to estimate properties and how its done.

[8]:
print(toluene.vapor_pressure(110 + 273.15))
0.923433500943906 bar
[9]:
print(toluene.viscosity_liquid(25 + 273.15))
0.0004848511681835698 pascal * second
[10]:
print(toluene.heat_capacity_liquid(50 + 273.15))
174.140191226778 joule / kelvin / mole
[11]:
toluene.heat_capacity_ideal_gas(150 + 273.15)
[11]:
150.13004821343338 joule/(kelvin mole)

You can check the full documentation and check all the properties you can estimate by doing:

[12]:
?toluene

Joback model allows the user to provide the experimental normal boiling temperature of the subtance to increase the accuracy of some properties estimations. This value will be used instead of the Joback estimated normal boiling temperature to calculate some properties, for example the critical temperature.

[13]:
toluene = joback.get_groups("toluene", normal_boiling_point=(110.6 + 273.15))
[14]:
print(toluene.critical_temperature)
593.8980798775972 kelvin
[15]:
print(toluene.vapor_pressure(110.6 + 273.15))
1.0132500000000007 bar

Abdulelah-Gani model

https://github.com/PEESEgroup/Pure-Component-Property-Estimation

The recently developed Abdulelah-Gani model is also available in ugropy. This model is on an early stage of development and it is not as complete as the original publication. Some of tertiary structures are not yet implemented and for that, the properties that depend on them are not available. Let’s see how to use it:

[16]:
from ugropy import abdulelah_gani
[17]:
adrenaline = abdulelah_gani.get_groups("adrenaline")
[18]:
adrenaline.primary.subgroups
[18]:
{'CH3': 1, 'aCH': 3, 'aC-CH': 1, 'OH': 1, 'aC-OH': 2, 'CH2NH': 1}
[19]:
adrenaline.secondary.subgroups
[19]:
{'CHm(OH)CHn(NHp) (m,n,p in 0..2)': 1,
 'aC-CHn-OH (n in 1..2)': 1,
 'AROMRINGs1s2s4': 1}
[20]:
adrenaline.tertiary.subgroups
[20]:
{}

As you can see, The Abdulelah-Gani model includes different kinds of subgropus. The primary structures works as the other models we have discussed before. The secondary and tertiary are additional structures that are used to increase the accuracy of the estimations. Also, differentiates isomers. For example:

[21]:
hexa23 = abdulelah_gani.get_groups("2,3-dimethylhexane")
hexa24 = abdulelah_gani.get_groups("2,4-dimethylhexane")
[22]:
hexa23.critical_temperature
[22]:
564.8418659474339 kelvin
[23]:
hexa24.critical_temperature
[23]:
549.3932758799738 kelvin
[24]:
print(hexa23.primary.subgroups)
print(hexa24.primary.subgroups)
{'CH3': 4, 'CH2': 2, 'CH': 2}
{'CH3': 4, 'CH2': 2, 'CH': 2}
[25]:
print(hexa23.secondary.subgroups)
print(hexa24.secondary.subgroups)
{'(CH3)2CH': 1, 'CH(CH3)CH(CH3)': 1}
{'(CH3)2CH': 1}

As you can see, both molecules have the same primary structure but different secondary structures. This is why the properties are different. On a Property Estimator that only have primary structures like the Joback model we would get the same properties for both molecules.

[26]:
print(joback.get_groups("2,3-dimethylhexane").critical_temperature)
print(joback.get_groups("2,4-dimethylhexane").critical_temperature)
552.9339856842008 kelvin
552.9339856842008 kelvin

All the properties that can be estimated with the Abdulelah-Gani model are:

[27]:
print(hexa23.critical_temperature)
print(hexa23.critical_pressure)
print(hexa23.critical_volume)
print(hexa23.acentric_factor)
print(hexa23.ig_formation_enthalpy)
print(hexa23.ig_formation_gibbs)
564.8418659474339 kelvin
26.35038783694938 bar
465.335 centimeter ** 3 / mole
0.35672642267081217 dimensionless
-214.82596509091175 kilojoule / mole
15.59002996305599 kilojoule / mole

If you refer yo the original publication you will notice that the Abdulelah-Gani model allows to estimate the properties by to methods:

  • GC-SIMPLE: Algebraic correlations

  • ML: trained neural networks

ugropy evaluates the properties using the GC-SIMPLE method. The ML method is not provided in this library. To learn how evaluate the properties with the ML method, please refer to the original publication supplementary material:

https://github.com/PEESEgroup/Pure-Component-Property-Estimation

However, ugropyhelps you a little to evaluate the trained neural networks. The fragmentation results give you the numpy array needed to evaluate the ML model ready to use:

[28]:
hexa23.ml_vector
[28]:
array([[4, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0]])

To visualize the fragmentation results you can use the draw method for each structure type:

[29]:
hexa24.primary.draw(width=700)
[29]:
../_images/tutorial_properties_estimators_42_0.svg
[30]:
hexa24.secondary.draw(width=700)
[30]:
../_images/tutorial_properties_estimators_43_0.svg

The example molecules doesn’t have tertiary structures, let’s get the solution for a molecule that has tertiary structures:

[31]:
fluorene = abdulelah_gani.get_groups("9H-Fluorene")
[32]:
fluorene.primary.draw(width=800)
[32]:
../_images/tutorial_properties_estimators_46_0.svg
[33]:
fluorene.tertiary.draw(width=800)
[33]:
../_images/tutorial_properties_estimators_47_0.svg

Finally, getting multiple solutions. Secondary and tertiary structures can be overlapped so they doesn’t generate multiple solutions. But of course, the primary structures will generate multiple solutions. Let’s see an example:

[34]:
mol = abdulelah_gani.get_groups("COc1ccccc1N(=O)=O", "smiles", search_multiple_solutions=True)
[35]:
mol[0].primary.draw(width=600)
[35]:
../_images/tutorial_properties_estimators_50_0.svg
[36]:
mol[1].primary.draw(width=600)
[36]:
../_images/tutorial_properties_estimators_51_0.svg

Both solutions have the same secondary and tertiary results, but different primary results. This is why the properties are different.

[37]:
mol[0].critical_temperature
[37]:
750.949961574131 kelvin
[38]:
mol[1].critical_temperature
[38]:
784.2891401277112 kelvin

What a difference!! In this case, the second solution provides a much better estimation of critical temperature (782 K). You can check it by yourself on the publication original dataset.