{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "58c8dfd7",
   "metadata": {},
   "source": [
    "## User defined fragmentation models\n",
    "\n",
    "Users may use the `ugropy` API to define their own fragmentation models. The\n",
    "basic models API is the `FragmentationModel` class. Instances of this class\n",
    "have all the methods and attributes necessary to define a fragmentation model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "74aa71ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ugropy import FragmentationModel"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "944ec03b",
   "metadata": {},
   "source": [
    "Groups are defined on a `pandas` DataFrame. This is because is convenient\n",
    "to store the data in a tabular format, and simply read it from a file. You can\n",
    "check examples on the repository (https://github.com/ipqa-research/ugropy/blob/main/ugropy/groupscsv/unifac/unifac_subgroups.csv)\n",
    "\n",
    "We can define a simplified UNIFAC fragmentation model as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a6223244",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# We define a simple fragmentation model with some common groups.\n",
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"group\": [\"CH3\", \"CH2\", \"CH\", \"C\", \"AC\", \"ACH\", \"ACCH3\", \"ACCH2\"],\n",
    "        \"smarts\": [\n",
    "            \"[CX4H3]\",\n",
    "            \"[CX4H2]\",\n",
    "            \"[CX4H]\",\n",
    "            \"[CX4H0]\",\n",
    "            \"[cH0]\",\n",
    "            \"[cH]\",\n",
    "            \"[cH0][CX4H3]\",\n",
    "            \"[cH0][CX4H2]\",\n",
    "        ]\n",
    "    }\n",
    ")\n",
    "\n",
    "# Set the group column as the index\n",
    "df.set_index(\"group\", inplace=True)\n",
    "\n",
    "\n",
    "# Define a fragmentation model using the defined groups. UNIFAC-like models\n",
    "# doesn't allow overlappin groups or atoms that not belong to any group.\n",
    "mymodel = FragmentationModel(\n",
    "    subgroups=df,\n",
    "    allow_overlapping=False,\n",
    "    allow_free_atoms=False, \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "631e0088",
   "metadata": {},
   "source": [
    "With this instance we can detect fragments and solve groups as any other\n",
    "`ugropy` model. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9becb507",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'ACH': 5, 'ACCH3': 1}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sol = mymodel.get_groups(\"toluene\")\n",
    "\n",
    "sol.subgroups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "01c0c4e2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" version=\"1.1\" baseProfile=\"full\" xml:space=\"preserve\" width=\"800px\" height=\"200px\" viewBox=\"0 0 800 200\">\n",
       "<!-- END OF HEADER -->\n",
       "<rect style=\"opacity:1.0;fill:#FFFFFF;stroke:none\" width=\"800.0\" height=\"200.0\" x=\"0.0\" y=\"0.0\"> </rect>\n",
       "<ellipse cx=\"145.0\" cy=\"115.7\" rx=\"15.6\" ry=\"15.6\" class=\"atom-0\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"170.3\" cy=\"42.0\" rx=\"15.6\" ry=\"15.6\" class=\"atom-1\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"246.7\" cy=\"27.0\" rx=\"15.6\" ry=\"15.6\" class=\"atom-2\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"297.9\" cy=\"85.7\" rx=\"15.6\" ry=\"15.6\" class=\"atom-3\" style=\"fill:#FF7F0EA5;fill-rule:evenodd;stroke:#FF7F0EA5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"272.7\" cy=\"159.4\" rx=\"15.6\" ry=\"15.6\" class=\"atom-4\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"196.3\" cy=\"174.4\" rx=\"15.6\" ry=\"15.6\" class=\"atom-5\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"374.4\" cy=\"70.7\" rx=\"15.6\" ry=\"15.6\" class=\"atom-6\" style=\"fill:#FF7F0EA5;fill-rule:evenodd;stroke:#FF7F0EA5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"425.6\" cy=\"129.3\" rx=\"15.6\" ry=\"15.6\" class=\"atom-7\" style=\"fill:#FF7F0EA5;fill-rule:evenodd;stroke:#FF7F0EA5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"502.1\" cy=\"114.3\" rx=\"15.6\" ry=\"15.6\" class=\"atom-8\" style=\"fill:#FF7F0EA5;fill-rule:evenodd;stroke:#FF7F0EA5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"553.3\" cy=\"173.0\" rx=\"15.6\" ry=\"15.6\" class=\"atom-9\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"629.7\" cy=\"158.0\" rx=\"15.6\" ry=\"15.6\" class=\"atom-10\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"655.0\" cy=\"84.3\" rx=\"15.6\" ry=\"15.6\" class=\"atom-11\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"603.7\" cy=\"25.6\" rx=\"15.6\" ry=\"15.6\" class=\"atom-12\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<ellipse cx=\"527.3\" cy=\"40.6\" rx=\"15.6\" ry=\"15.6\" class=\"atom-13\" style=\"fill:#1F77B4A5;fill-rule:evenodd;stroke:#1F77B4A5;stroke-width:1.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 145.0,115.7 L 170.3,42.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-0 atom-0 atom-1\" d=\"M 158.3,113.1 L 179.1,52.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-1 atom-1 atom-2\" d=\"M 170.3,42.0 L 246.7,27.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 246.7,27.0 L 297.9,85.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-2 atom-2 atom-3\" d=\"M 242.3,39.8 L 284.7,88.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-3 atom-3 atom-4\" d=\"M 297.9,85.7 L 272.7,159.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 272.7,159.4 L 196.3,174.4\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-4 atom-4 atom-5\" d=\"M 263.9,149.2 L 200.6,161.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-5 atom-3 atom-6\" d=\"M 297.9,85.7 L 374.4,70.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-6 atom-6 atom-7\" d=\"M 374.4,70.7 L 425.6,129.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-7 atom-7 atom-8\" d=\"M 425.6,129.3 L 502.1,114.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 502.1,114.3 L 553.3,173.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-8 atom-8 atom-9\" d=\"M 515.3,111.7 L 557.7,160.2\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-9 atom-9 atom-10\" d=\"M 553.3,173.0 L 629.7,158.0\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-10 atom-11\" d=\"M 629.7,158.0 L 655.0,84.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-10 atom-10 atom-11\" d=\"M 620.9,147.8 L 641.7,86.9\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-11 atom-11 atom-12\" d=\"M 655.0,84.3 L 603.7,25.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 603.7,25.6 L 527.3,40.6\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-12 atom-12 atom-13\" d=\"M 599.4,38.3 L 536.1,50.8\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-13 atom-5 atom-0\" d=\"M 196.3,174.4 L 145.0,115.7\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path class=\"bond-14 atom-13 atom-8\" d=\"M 527.3,40.6 L 502.1,114.3\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 146.3,112.0 L 145.0,115.7 L 147.6,118.7\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 169.0,45.7 L 170.3,42.0 L 174.1,41.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 242.9,27.7 L 246.7,27.0 L 249.3,29.9\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 274.0,155.7 L 272.7,159.4 L 268.9,160.1\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 200.1,173.7 L 196.3,174.4 L 193.7,171.5\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 370.6,71.4 L 374.4,70.7 L 376.9,73.6\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 423.1,126.4 L 425.6,129.3 L 429.4,128.6\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 550.7,170.1 L 553.3,173.0 L 557.1,172.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 625.9,158.7 L 629.7,158.0 L 631.0,154.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 653.7,88.0 L 655.0,84.3 L 652.4,81.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 606.3,28.5 L 603.7,25.6 L 599.9,26.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<path d=\"M 531.1,39.9 L 527.3,40.6 L 526.0,44.3\" style=\"fill:none;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;\"/>\n",
       "<rect x=\"1\" y=\"5\" width=\"18.0\" height=\"18.0\" fill=\"rgb(31, 119, 179)\"/><text x=\"19.200000000000003\" y=\"20\" font-family=\"Helvetica\" font-size=\"12\" fill=\"black\">ACH: 10</text><rect x=\"1\" y=\"30\" width=\"18.0\" height=\"18.0\" fill=\"rgb(255, 127, 13)\"/><text x=\"19.200000000000003\" y=\"45\" font-family=\"Helvetica\" font-size=\"12\" fill=\"black\">ACCH2: 2</text><text x=\"400.0\" y=\"40\" font-family=\"Helvetica\" font-size=\"12\" font-weight=\"bold\" fill=\"black\" text-anchor=\"middle\"/></svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sol = mymodel.get_groups(\"bibenzyl\")\n",
    "\n",
    "sol.draw(width=800)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac292ee8",
   "metadata": {},
   "source": [
    "The `FragmentationModel.get_groups` method returns a `FragmentationResult`\n",
    "object. To extend the behavior of the base `FragmentationModel` class, you can\n",
    "inherit from it and inherit from `FragmentationResult` as well.\n",
    "\n",
    "An example, could be the `GibbsModel` class and the `GibbsFragmentationResult`\n",
    "class. The differences between these classes are that receives an extra argument\n",
    "(the groups' information dataframe). These models work exactly the same but\n",
    "also calculates the R and Q of the molecule once the fragments are detected.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a9e94dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ugropy import GibbsFragmentationResult, GibbsModel"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "818c6b74",
   "metadata": {},
   "source": [
    "You may check these classes source code in the API documentation, it's very\n",
    "simple to follow and understand how to extend the basic behavior of the base\n",
    "``FragmentationModel`` class.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ugropy",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}