{"id":1504,"date":"2024-01-08T00:00:00","date_gmt":"2024-01-08T05:00:00","guid":{"rendered":"https:\/\/molecularsciences.org\/content\/?p=1504"},"modified":"2024-01-25T16:08:47","modified_gmt":"2024-01-25T21:08:47","slug":"python-script-to-generate-chemical-formula-from-smiles-notation","status":"publish","type":"post","link":"https:\/\/molecularsciences.org\/content\/python-script-to-generate-chemical-formula-from-smiles-notation\/","title":{"rendered":"Python script to generate chemical formula from SMILES notation"},"content":{"rendered":"\n<p>Chemical formulas are the fundamental language of chemistry, encapsulating the essential composition of molecules. SMILES (Simplified Molecular Input Line Entry System) notation, on the other hand, provides a concise and human-readable representation of molecular structures. In this technical blog post, we will explore the process of harnessing Python, coupled with the RDKit library, to seamlessly generate chemical formulas from SMILES notations.<\/p>\n\n\n\n<p>SMILES notation has become a standard in cheminformatics for representing molecular structures. Its simplicity and versatility make it an ideal choice for encoding complex information in a single line. We aim to demystify the process of translating SMILES notations into chemical formulas, unveiling the Python script that performs this transformation using RDKit.<\/p>\n\n\n\n<p><strong>The Script: Python and RDKit<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from rdkit import Chem\nfrom collections import Counter\n\ndef generate_chemical_formula(smiles):\n    # Generate a molecular object from the SMILES notation\n    mol = Chem.MolFromSmiles(smiles)\n\n    # Check if the SMILES notation is valid\n    if mol is None:\n        raise ValueError(\"Invalid SMILES notation\")\n\n    # Get the molecular formula as a dictionary\n    formula_dict = Counter()\n    for atom in mol.GetAtoms():\n        atom_symbol = atom.GetSymbol()\n        atom_count = atom.GetTotalNumHs() + 1  # Include hydrogen atoms\n        formula_dict&#91;atom_symbol] += atom_count\n\n    return formula_dict\n\nif __name__ == \"__main__\":\n    # Example usage\n    smiles_notation = \"CCO\"\n    formula = generate_chemical_formula(smiles_notation)\n    print(f\"Chemical Formula for {smiles_notation}: {formula}\")<\/code><\/pre>\n\n\n\n<p><strong>Explanation of the code<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Importing Libraries:<\/strong> The script starts by importing RDKit for molecular manipulation and the Counter class to efficiently count atom occurrences.<\/li>\n\n\n\n<li><strong>Generating a Molecular Object:<\/strong> The <code>generate_chemical_formula<\/code> function takes a SMILES string, creates an RDKit molecular object, and checks if the SMILES notation is valid.<\/li>\n\n\n\n<li><strong>Calculating the Formula:<\/strong> The function then iterates through the atoms, retrieves their symbols and counts, and populates a Counter dictionary representing the molecular formula.<\/li>\n\n\n\n<li><strong>Example Usage:<\/strong> The script concludes with an example where a SMILES string (&#8220;CCO&#8221;) is converted into a chemical formula and printed to the console.<\/li>\n<\/ol>\n\n\n\n<p>In conclusion, this Python script serves as a gateway between the world of SMILES notations and chemical formulas. By leveraging RDKit, a powerful cheminformatics toolkit, we seamlessly bridge the gap, allowing for the effortless extraction of molecular composition from concise SMILES representations. As computational chemistry continues to evolve, Python and libraries like RDKit empower researchers and chemists to explore the intricacies of molecular structures with efficiency and ease.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Chemical formulas are the fundamental language of chemistry, encapsulating the essential composition of molecules. SMILES (Simplified Molecular Input Line Entry System) notation, on the other hand, provides a concise and human-readable representation of molecular structures. In this technical blog post, we will explore the process of harnessing Python, coupled with the RDKit library, to seamlessly [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1514,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[203],"tags":[137,485],"class_list":["post-1504","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","tag-python","tag-smiles-notation"],"_links":{"self":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/1504","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/comments?post=1504"}],"version-history":[{"count":1,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/1504\/revisions"}],"predecessor-version":[{"id":1505,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/1504\/revisions\/1505"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/media\/1514"}],"wp:attachment":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/media?parent=1504"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/categories?post=1504"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/tags?post=1504"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}