syndirella.slipper.slipper_synthesizer.SlipperSynthesizer
=========================================================

.. py:module:: syndirella.slipper.slipper_synthesizer.SlipperSynthesizer

.. autoapi-nested-parse::

   slipper_synthesizer/CobblersWorkshop.py

   This module contains the SlipperSynthesizer class.


Classes
-------

.. autoapisummary::

   syndirella.slipper.slipper_synthesizer.SlipperSynthesizer.SlipperSynthesizer


Module Contents
---------------

.. py:class:: SlipperSynthesizer(library: syndirella.route.Library.Library, output_dir: str, atom_ids_expansion: dict = None, additional_info: dict = None)

   This class is used to perform the whole process of finding products of the analogues of reactants.
   Since the final elaborated products are 'slippers' in this analogy, the SlipperSynthesizer
   is where these slippers are made.

   This is supposed to be instantiated for each step in the route.


   .. py:attribute:: route_uuid
      :type:  str


   .. py:attribute:: library


   .. py:attribute:: output_dir


   .. py:attribute:: analogues_dataframes_to_react
      :type:  Dict[str, pandas.DataFrame]


   .. py:attribute:: analogue_columns
      :type:  List[str]
      :value: None


   .. py:attribute:: products
      :type:  pandas.DataFrame
      :value: None


   .. py:attribute:: reactant_combinations
      :type:  pandas.DataFrame
      :value: None


   .. py:attribute:: final_products_pkl_path
      :type:  str
      :value: None


   .. py:attribute:: final_products_csv_path
      :type:  str
      :value: None


   .. py:attribute:: atom_ids_expansion
      :type:  dict
      :value: None


   .. py:attribute:: additional_info
      :value: None


   .. py:attribute:: current_step
      :type:  int


   .. py:attribute:: num_steps
      :type:  int


   .. py:attribute:: logger


   .. py:attribute:: atom_diff_min
      :type:  int


   .. py:attribute:: atom_diff_max
      :type:  int


   .. py:attribute:: num_unique_products
      :type:  int
      :value: 0


   .. py:attribute:: num_products_enumstereo
      :type:  int
      :value: 0


   .. py:method:: get_products() -> pandas.DataFrame

      This function is used to find the products of the analogues of reactants. It is the main function that is
      called.


   .. py:method:: check_product_pkl_exists()

      This function checks if the products pkl already exists and if so it loads it.


   .. py:method:: load_products()

      This function loads the scaffold .pkl file.


   .. py:method:: filter_analogues()

      This function is used to go through the analogue dataframes, passing them to filter_analogues_on_smarts and
      also ordering by metrics.

      Finally it filters the analogues by number, making sure there aren't too many for an obscene number of products.


   .. py:method:: order_analogues(df: pandas.DataFrame, reactant_prefix: str) -> pandas.DataFrame

      This function is used to order the analogues dataframes by num atom diff to scaffold reactant of scaffold compound,
      number of reactant matches found, and lead time.


   .. py:method:: filter_analogues_on_smarts(df: pandas.DataFrame, analogue_columns: Tuple[str, str], reactant_prefix: str) -> pandas.DataFrame

      This function is used to filter the analogues of reactants dataframes to make sure each analogue contains the
      SMARTS pattern of the original reactant. If the SMARTS pattern of the other reactant is found as well, it is flagged.


   .. py:method:: filter_analogues_by_size()

      This function is used to filter the analogues dataframes by length. Need to make sure the final combination
      is less than 10,000.

      If longer than 10,000, will just take the head with length of the square root of 10,000 (100).


   .. py:method:: cut_analogues(df: pandas.DataFrame, max_length_each: int, analogue_prefix: int) -> pandas.DataFrame

      This function is used to cut the analogues dataframes to max_length_each by just taking the head.


   .. py:method:: cluster_analogues(df: pandas.DataFrame, max_length_each: int, analogue_prefix: int) -> pandas.DataFrame

      This function is used to cluster the analogues dataframes to max_length_each by k-means clustering.
      The number of clusters is the number max length each. Might be too much...


   .. py:method:: combine_flags(row) -> Tuple[str] | None


   .. py:method:: combine_analogues()

      This function is used to combine the analogues of reactants into 1 dataframe that the products are found from.


   .. py:method:: find_products_from_reactants() -> pandas.DataFrame

      This function is used to find the products of the reactant combinations.


   .. py:method:: get_products_from_single_reactant() -> pandas.DataFrame

      This function gets the products from a single reactant (like deprotections).


   .. py:method:: apply_reaction_single(row) -> pandas.Series

      For mono-molecular reactions:
      This function applies the original reaction to each row of the reactant combinations dataframe. Can return
      multiple products.


   .. py:method:: apply_reaction(row) -> pandas.Series

      For bimolecular reactions:
      This function applies the original reaction to each row of the reactant combinations dataframe. Checks to return
      only products that are sanitized.


   .. py:method:: can_be_sanitized(mol: syndirella.error.Chem.Mol) -> bool


   .. py:method:: calc_num_atom_diff_mcs(base: syndirella.error.Chem.Mol, product: syndirella.error.Chem.Mol) -> int

      This function is used to calculate the number of atoms added to scaffold
      by finding the maximum common substructure (MCS) and then finding the difference in length.


   .. py:method:: calc_num_atom_diff_absolute(base: syndirella.error.Chem.Mol, product: syndirella.error.Chem.Mol) -> int

      This function calculates the absolute number of atoms difference between the scaffold and scaffold.


   .. py:method:: filter_products(products: pandas.DataFrame) -> pandas.DataFrame

      This function is used to filter the products dataframe to remove any rows with None values. Also
      removes duplicates.


   .. py:method:: _print_diff(orig_df: pandas.DataFrame, input_df: pandas.DataFrame, verb: str = None)

      This function is used to print the difference between the original number of analogues and the number of
      valid analogues.


   .. py:method:: calculate_fingerprints(products)

      Calculate morgan fingerprints for each molecule.


   .. py:method:: find_similarity_groups(products: pandas.DataFrame) -> (pandas.DataFrame, int)

      This is an intensive function to find all the similarity groups of the products. Could definitely be optimized.


   .. py:method:: assign_names_based_on_groups(products: pandas.DataFrame, library_id: str, base_group_id: int) -> pandas.DataFrame

      Assign names to products based on their group ID, ensuring duplicates have the same name.


   .. py:method:: add_metadata(products: pandas.DataFrame) -> pandas.DataFrame


   .. py:method:: enumerate_stereoisomers(products: pandas.DataFrame) -> pandas.DataFrame

      This function is used to enumerate the stereoisomers of the products.


   .. py:method:: find_stereoisomers(smiles: str) -> List[syndirella.error.Chem.Mol]


   .. py:method:: save_products()

      This function is used to save the products dataframe as a .pkl file.


   .. py:method:: label_products()

      This function makes a new instance of the Labeler class and calls the label_products function.