Working with ugropy

Failing

Open In Colab

ugropy may fail to obtain the subgroups of a molecule for a certain model for two reasons: either there is a bug in the code, or the molecule cannot be represented by the subgroups of the failing model.

ugropy utilizes SMARTS for the representation of functional groups to inquire whether the molecule contains those structures. Let’s examine the functional group list for the classic liquid-vapor UNIFAC model.

[1]:
try:
  import google.colab
  %pip install -q ugropy
except ImportError:
  pass
[2]:
from ugropy import unifac

unifac.subgroups
[2]:
smarts molecular_weight
group
CH3 [CX4H3] 15.03500
CH2 [CX4H2] 14.02700
CH [CX4H] 13.01900
C [CX4H0] 12.01100
CH2=CH [CH2]=[CH] 27.04600
... ... ...
NCO [NX2H0]=[CX2H0]=[OX1H0] 42.01700
(CH2)2SU [CH2]S(=O)(=O)[CH2] 92.11620
CH2CHSU [CH2]S(=O)(=O)[CH] 91.10840
IMIDAZOL [c]1:[c]:[n]:[c]:[n]:1 68.07820
BTI C(F)(F)(F)S(=O)(=O)[N-]S(=O)(=O)C(F)(F)F 279.91784

113 rows × 2 columns

For example, let’s check the SMARTS representation of the alcohol group ACOH:

[3]:
unifac.subgroups.loc["ACOH", "smarts"]
[3]:
'[cH0][OH]'

The SMARTS representation it’s telling us that the OH group it’s, of course, a hydroxyl group bounded by a single bound to an aromatic carbon atom.

An example of a molecule that cannot be represented by UNIFAC groups:

[4]:
from rdkit.Chem import Draw


mol = unifac.get_groups("C1(=CC=CC=C1)OC(C)(C)C", "smiles")

Draw.MolToImage(mol.molecule)
[4]:
../_images/tutorial_working_with_ugropy_6_0.png
[5]:
print(mol.subgroups)
{}

The library “fails” to obtain any functional groups to accurately represent the molecule. This failure is represented by an empty dictionary. In this case, the “fail” is correct, but it could fail due to errors in the groups SMARTS representations or the algorithm, resulting in an empty dictionary as well. Currently, the supported models are tested against 444 different molecules.

If you encounter a failing representation, you can examine the structure of the molecule and the list of functional groups of the failing model. If you determine that the molecule can indeed be modeled, you may have discovered a bug. Feel free to report the issue on the repository along with the failing molecule’s SMILES/name, the failing model and the ugropy version.

More than one solution

Models like UNIFAC or PSRK can have multiple solutions to represent a molecule, and ugropy tries its best to find them all. In such cases, you will receive a list of dictionaries, each containing one of the solutions found. Let’s take a look.

[6]:
from ugropy import unifac
from rdkit.Chem import Draw


mol = unifac.get_groups("CCCC1=CC=C(CC(=O)OC)C=C1", "smiles", search_multiple_solutions=True)

Draw.MolToImage(mol[0].molecule, highlightAtoms=[7])
[6]:
../_images/tutorial_working_with_ugropy_9_0.png

This molecule can be modeled in two ways depending on how we treat the CH2 attached to the ring and the ester carbon (highlighted in red). We can either form an ACCH2 group and model the ester group with COO, or we can use an AC group and model the ester group with CH2COO.

[7]:
print("Solution 1:")
print(mol[0].subgroups)
print("solution 2:")
print(mol[1].subgroups)
Solution 1:
{'CH3': 2, 'CH2': 1, 'ACH': 4, 'ACCH2': 2, 'COO': 1}
solution 2:
{'CH3': 2, 'CH2': 1, 'ACH': 4, 'AC': 1, 'ACCH2': 1, 'CH2COO': 1}
[8]:
mol[0].draw(width=800)
[8]:
../_images/tutorial_working_with_ugropy_12_0.svg
[9]:
mol[1].draw(width=800)
[9]:
../_images/tutorial_working_with_ugropy_13_0.svg

This could be useful in cases where some groups have more interaction parameters than others in the mixture that you want to model with UNIFAC. Alternatively, you can try both approaches and compare if there are any differences.

More solutions (non-optimal)

ugropy allows you to search multiple solutions for the models, but all these solutions are optimal (uses the same number of groups to represent the molecule). It’s possible to search for non-optimal solutions along with the optimal ones. This is useful when the user wants to find all possible combinations of fragments that cover the molecule. This feature is intended for research purposes and is not recommended to actually model the molecules.

We can do this with the search_nonoptimal parameter:

[10]:
from ugropy import unifac

solutions = unifac.get_groups(
    "9,10-dihydroanthracene",
    search_multiple_solutions=True, # If this is False, search_nonoptimal is ignored
    search_nonoptimal=True,
)

solutions
[10]:
[<ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f06f3964e90>,
 <ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f06f3964d60>,
 <ugropy.core.frag_classes.gibbs_model.gibbs_result.GibbsFragmentationResult at 0x7f06f395be30>]

We can check all the obtained solutions:

[11]:
solutions[0].draw(width=600)
[11]:
../_images/tutorial_working_with_ugropy_18_0.svg
[12]:
solutions[1].draw(width=600)
[12]:
../_images/tutorial_working_with_ugropy_19_0.svg
[13]:
solutions[2].draw(width=600)
[13]:
../_images/tutorial_working_with_ugropy_20_0.svg

If you are an experienced user of UNIFAC-like models, you may notice that the only “correct” solution is the first one. All ugropy models have the capacity to also search for non-optimal solutions.

Models information

You can check information about the models’ fragments. For example, the UNIFAC, PSRK, and Dortmund models have the information about the R and Q values, the subgroup number, and main group number. you can access to this information with model.subgroups_info:

[14]:
from ugropy import unifac, psrk, dortmund


unifac.subgroups_info
[14]:
subgroup_number main_group R Q
group
CH3 1 1 0.9011 0.848
CH2 2 1 0.6744 0.540
CH 3 1 0.4469 0.228
C 4 1 0.2195 0.000
CH2=CH 5 2 1.3454 1.176
... ... ... ... ...
NCO 109 51 1.0567 0.732
(CH2)2SU 118 55 2.6869 2.120
CH2CHSU 119 55 2.4595 1.808
IMIDAZOL 178 84 2.0260 0.868
BTI 179 85 5.7740 4.932

113 rows × 4 columns

Moreover, you can consult the fragments properties contributions of models like Joback:

[15]:
from ugropy import joback, abdulelah_gani

joback.properties_contributions
[15]:
Tc Pc Vc Tb Tf Hform Gform a b c d Hfusion Hvap na nb num_of_atoms
group
-CH3 0.0141 -0.0012 65.0 23.58 -5.10 -76.45 -43.96 19.500 -0.00808 0.000153 -9.670000e-08 0.908 2.373 548.29 -1.719 4
-CH2- 0.0189 0.0000 56.0 22.88 11.27 -20.64 8.42 -0.909 0.09500 -0.000054 1.190000e-08 2.590 2.226 94.16 -0.199 3
>CH- 0.0164 0.0020 41.0 21.74 12.64 29.89 58.36 -23.000 0.20400 -0.000265 1.200000e-07 0.749 1.691 -322.15 1.187 2
>C< 0.0067 0.0043 27.0 18.25 46.43 82.23 116.02 -66.200 0.42700 -0.000641 3.010000e-07 -1.460 0.636 -573.56 2.307 1
=CH2 0.0113 -0.0028 56.0 18.18 -4.32 -9.63 3.77 23.600 -0.03810 0.000172 -1.030000e-07 -0.473 1.724 495.01 -1.539 3
=CH- 0.0129 -0.0006 46.0 24.96 8.73 37.97 48.53 -8.000 0.10500 -0.000096 3.560000e-08 2.691 2.205 82.28 -0.242 2
=C< 0.0117 0.0011 38.0 24.14 11.14 83.99 92.36 -28.100 0.20800 -0.000306 1.460000e-07 3.063 2.138 NaN NaN 1
=C= 0.0026 0.0028 36.0 26.15 17.78 142.14 136.70 27.400 -0.05570 0.000101 -5.020000e-08 4.720 2.661 NaN NaN 1
CH 0.0027 -0.0008 46.0 9.20 -11.18 79.30 77.71 24.500 -0.02710 0.000111 -6.780000e-08 2.322 1.155 NaN NaN 2
C 0.0020 0.0016 37.0 27.38 64.32 115.51 109.82 7.870 0.02010 -0.000008 1.390000e-09 4.151 3.302 NaN NaN 1
ring-CH2- 0.0100 0.0025 48.0 27.15 7.75 -26.80 -3.68 -6.030 0.08540 -0.000008 -1.800000e-08 0.490 2.398 307.53 -0.798 3
ring>CH- 0.0122 0.0004 38.0 21.78 19.88 8.67 40.99 -20.500 0.16200 -0.000160 6.240000e-08 3.243 1.942 -394.29 1.251 2
ring>C< 0.0042 0.0061 27.0 21.32 60.15 79.72 87.88 -90.900 0.55700 -0.000900 4.690000e-07 -1.373 0.644 NaN NaN 1
ring=CH- 0.0082 0.0011 41.0 26.73 8.13 2.09 11.30 -2.140 0.05740 -0.000002 -1.590000e-08 1.101 2.544 259.65 -0.702 2
ring=C< 0.0143 0.0008 32.0 31.01 37.02 46.43 54.05 -8.250 0.10100 -0.000142 6.780000e-08 2.394 3.059 -245.74 0.912 1
-F 0.0111 -0.0057 27.0 -0.03 -15.78 -251.92 -247.19 26.500 -0.09130 0.000191 -1.030000e-07 1.398 -0.670 NaN NaN 1
-Cl 0.0105 -0.0049 58.0 38.13 13.55 -71.55 -64.31 33.300 -0.09630 0.000187 -9.960000e-08 2.515 4.532 625.45 -1.814 1
-Br 0.0133 0.0057 71.0 66.86 43.43 -29.48 -38.06 28.600 -0.06490 0.000136 -7.450000e-08 3.603 6.582 738.91 -2.038 1
-I 0.0068 -0.0034 97.0 93.84 41.69 21.06 5.74 32.100 -0.06410 0.000126 -6.870000e-08 2.724 9.520 809.55 -2.224 1
-OH (alcohol) 0.0741 0.0112 28.0 92.88 44.45 -208.04 -189.20 25.700 -0.06910 0.000177 -9.880000e-08 2.406 16.826 2173.72 -5.057 2
-OH (phenol) 0.0240 0.0184 -25.0 76.34 82.83 -221.65 -197.37 -2.810 0.11100 -0.000116 4.940000e-08 4.490 12.499 3018.17 -7.314 2
-O- (non-ring) 0.0168 0.0015 18.0 22.42 22.23 -132.22 -105.00 25.500 -0.06320 0.000111 -5.480000e-08 1.188 2.410 122.09 -0.386 1
-O- (ring) 0.0098 0.0048 13.0 31.22 23.05 -138.16 -98.22 12.200 -0.01260 0.000060 -3.860000e-08 5.879 4.682 440.24 -0.953 1
>C=O (non-ring) 0.0380 0.0031 62.0 76.75 61.20 -133.22 -120.50 6.450 0.06700 -0.000036 2.860000e-09 4.189 8.972 340.35 -0.350 2
>C=O (ring) 0.0284 0.0028 55.0 94.97 75.97 -164.50 -126.27 30.400 -0.08290 0.000236 -1.310000e-07 NaN 6.645 NaN NaN 2
O=CH- (aldehyde) 0.0379 0.0030 82.0 72.24 36.90 -162.03 -143.48 30.900 -0.03360 0.000160 -9.880000e-08 3.197 9.093 740.92 -1.713 3
-COOH (acid) 0.0791 0.0077 89.0 169.09 155.50 -426.72 -387.87 24.100 0.04270 0.000080 -6.870000e-08 11.051 19.537 1317.23 -2.578 4
-COO- (ester) 0.0481 0.0005 82.0 81.10 53.60 -337.92 -301.95 24.500 0.04020 0.000040 -4.520000e-08 6.959 9.633 483.88 -0.966 3
=O (other than above) 0.0143 0.0101 36.0 -10.50 2.08 -247.61 -250.83 6.820 0.01960 0.000013 -1.780000e-08 3.624 5.909 675.24 -1.340 1
-NH2 0.0243 0.0109 38.0 73.23 66.89 -22.02 14.07 26.900 -0.04120 0.000164 -9.760000e-08 3.515 10.788 NaN NaN 3
>NH (non-ring) 0.0295 0.0077 35.0 50.17 52.66 53.47 89.39 -1.210 0.07620 -0.000049 1.050000e-08 5.009 6.436 NaN NaN 2
>NH (ring) 0.0130 0.0114 29.0 52.82 101.51 31.65 75.61 11.800 -0.02300 0.000107 -6.280000e-08 7.490 6.930 NaN NaN 2
>N- (non-ring) 0.0169 0.0074 9.0 11.74 48.84 123.34 163.16 -31.100 0.22700 -0.000320 1.460000e-07 4.703 1.896 NaN NaN 1
-N= (non-ring) 0.0255 -0.0099 NaN 74.60 NaN 23.61 NaN NaN NaN NaN NaN NaN 3.335 NaN NaN 1
-N= (ring) 0.0085 0.0076 34.0 57.55 68.40 55.52 79.93 8.830 -0.00384 0.000044 -2.600000e-08 3.649 6.528 NaN NaN 1
=NH NaN NaN NaN 83.08 68.91 93.70 119.66 5.690 -0.00412 0.000128 -8.880000e-08 NaN 12.169 NaN NaN 2
-CN 0.0496 -0.0101 91.0 125.66 59.89 88.43 89.22 36.500 -0.07330 0.000184 -1.030000e-07 2.414 12.851 NaN NaN 2
-NO2 0.0437 0.0064 91.0 152.54 127.24 -66.57 -16.83 25.900 -0.00374 0.000129 -8.880000e-08 9.679 16.738 NaN NaN 3
-SH 0.0031 0.0084 63.0 63.56 20.09 -17.33 -22.99 35.300 -0.07580 0.000185 -1.030000e-07 2.360 6.884 NaN NaN 2
-S- (non-ring) 0.0119 0.0049 54.0 68.78 34.40 41.87 33.12 19.600 -0.00561 0.000040 -2.760000e-08 4.130 6.817 NaN NaN 1
-S- (ring) 0.0019 0.0051 38.0 52.10 79.93 39.10 27.76 16.700 0.00481 0.000028 -2.110000e-08 1.557 5.984 NaN NaN 1

From all fragmentation models you have all the RDKit mol objects of each fragment already instantiated on a dictionary. This could be helpful to for calculations, multiple solution filtering, or visualization:

[16]:
unifac.detection_mols["CH3"]
[16]:
../_images/tutorial_working_with_ugropy_27_0.png
[17]:
unifac.detection_mols["CH2O"].GetNumAtoms()
[17]:
2

Changing the default ILP solver

By default, ugropy uses the default CBC solver from the PulP library. But PulP supports a lot of different solvers. An example could be de XPRESS_PY solver.

First, you must install XPRESS with:

pip install xpress

Then you can change the default solver with a simple argument:

[18]:
from ugropy import unifac


unifac.get_groups("toluene", solver_arguments={"solver": "XPRESS_PY"}).draw(width=600)
/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/pulp/apis/xpress_api.py:651: LicenseWarning: Using the Community license in this session. If you have a full Xpress license, pass the full path to your license file to xpress.init(). If you want to use the FICO Community license and no longer want to see this message, use the following code before using the xpress module:
  xpress.init('/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/xpress/license/community-xpauth.xpr')
  model = xpress.problem()
/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/pulp/apis/xpress_api.py:697: DeprecationWarning: Deprecated in Xpress 9.5: function called with old argument names: check the documentation for the new argument names
  c = xpress.constraint(body=lhs, sense=xpress.eq, rhs=rhs)
/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/pulp/apis/xpress_api.py:434: DeprecationWarning: Deprecated in Xpress 9.5: use problem.getSolution and related functions instead
  model.getmipsol(x, slacks)
[18]:
../_images/tutorial_working_with_ugropy_30_1.svg

Changing the ILP solver has impact on the execution time of the library.