syndirella.slipper.slipper_synthesizer.SlipperSynthesizer ========================================================= .. py:module:: syndirella.slipper.slipper_synthesizer.SlipperSynthesizer .. autoapi-nested-parse:: slipper_synthesizer/CobblersWorkshop.py This module contains the SlipperSynthesizer class. Classes ------- .. autoapisummary:: syndirella.slipper.slipper_synthesizer.SlipperSynthesizer.SlipperSynthesizer Module Contents --------------- .. py:class:: SlipperSynthesizer(library: syndirella.route.Library.Library, output_dir: str, atom_ids_expansion: dict = None, additional_info: dict = None) This class is used to perform the whole process of finding products of the analogues of reactants. Since the final elaborated products are 'slippers' in this analogy, the SlipperSynthesizer is where these slippers are made. This is supposed to be instantiated for each step in the route. .. py:attribute:: route_uuid :type: str .. py:attribute:: library .. py:attribute:: output_dir .. py:attribute:: analogues_dataframes_to_react :type: Dict[str, pandas.DataFrame] .. py:attribute:: analogue_columns :type: List[str] :value: None .. py:attribute:: products :type: pandas.DataFrame :value: None .. py:attribute:: reactant_combinations :type: pandas.DataFrame :value: None .. py:attribute:: final_products_pkl_path :type: str :value: None .. py:attribute:: final_products_csv_path :type: str :value: None .. py:attribute:: atom_ids_expansion :type: dict :value: None .. py:attribute:: additional_info :value: None .. py:attribute:: current_step :type: int .. py:attribute:: num_steps :type: int .. py:attribute:: logger .. py:attribute:: atom_diff_min :type: int .. py:attribute:: atom_diff_max :type: int .. py:attribute:: num_unique_products :type: int :value: 0 .. py:attribute:: num_products_enumstereo :type: int :value: 0 .. py:method:: get_products() -> pandas.DataFrame This function is used to find the products of the analogues of reactants. It is the main function that is called. .. py:method:: check_product_pkl_exists() This function checks if the products pkl already exists and if so it loads it. .. py:method:: load_products() This function loads the scaffold .pkl file. .. py:method:: filter_analogues() This function is used to go through the analogue dataframes, passing them to filter_analogues_on_smarts and also ordering by metrics. Finally it filters the analogues by number, making sure there aren't too many for an obscene number of products. .. py:method:: order_analogues(df: pandas.DataFrame, reactant_prefix: str) -> pandas.DataFrame This function is used to order the analogues dataframes by num atom diff to scaffold reactant of scaffold compound, number of reactant matches found, and lead time. .. py:method:: filter_analogues_on_smarts(df: pandas.DataFrame, analogue_columns: Tuple[str, str], reactant_prefix: str) -> pandas.DataFrame This function is used to filter the analogues of reactants dataframes to make sure each analogue contains the SMARTS pattern of the original reactant. If the SMARTS pattern of the other reactant is found as well, it is flagged. .. py:method:: filter_analogues_by_size() This function is used to filter the analogues dataframes by length. Need to make sure the final combination is less than 10,000. If longer than 10,000, will just take the head with length of the square root of 10,000 (100). .. py:method:: cut_analogues(df: pandas.DataFrame, max_length_each: int, analogue_prefix: int) -> pandas.DataFrame This function is used to cut the analogues dataframes to max_length_each by just taking the head. .. py:method:: cluster_analogues(df: pandas.DataFrame, max_length_each: int, analogue_prefix: int) -> pandas.DataFrame This function is used to cluster the analogues dataframes to max_length_each by k-means clustering. The number of clusters is the number max length each. Might be too much... .. py:method:: combine_flags(row) -> Tuple[str] | None .. py:method:: combine_analogues() This function is used to combine the analogues of reactants into 1 dataframe that the products are found from. .. py:method:: find_products_from_reactants() -> pandas.DataFrame This function is used to find the products of the reactant combinations. .. py:method:: get_products_from_single_reactant() -> pandas.DataFrame This function gets the products from a single reactant (like deprotections). .. py:method:: apply_reaction_single(row) -> pandas.Series For mono-molecular reactions: This function applies the original reaction to each row of the reactant combinations dataframe. Can return multiple products. .. py:method:: apply_reaction(row) -> pandas.Series For bimolecular reactions: This function applies the original reaction to each row of the reactant combinations dataframe. Checks to return only products that are sanitized. .. py:method:: can_be_sanitized(mol: syndirella.error.Chem.Mol) -> bool .. py:method:: calc_num_atom_diff_mcs(base: syndirella.error.Chem.Mol, product: syndirella.error.Chem.Mol) -> int This function is used to calculate the number of atoms added to scaffold by finding the maximum common substructure (MCS) and then finding the difference in length. .. py:method:: calc_num_atom_diff_absolute(base: syndirella.error.Chem.Mol, product: syndirella.error.Chem.Mol) -> int This function calculates the absolute number of atoms difference between the scaffold and scaffold. .. py:method:: filter_products(products: pandas.DataFrame) -> pandas.DataFrame This function is used to filter the products dataframe to remove any rows with None values. Also removes duplicates. .. py:method:: _print_diff(orig_df: pandas.DataFrame, input_df: pandas.DataFrame, verb: str = None) This function is used to print the difference between the original number of analogues and the number of valid analogues. .. py:method:: calculate_fingerprints(products) Calculate morgan fingerprints for each molecule. .. py:method:: find_similarity_groups(products: pandas.DataFrame) -> (pandas.DataFrame, int) This is an intensive function to find all the similarity groups of the products. Could definitely be optimized. .. py:method:: assign_names_based_on_groups(products: pandas.DataFrame, library_id: str, base_group_id: int) -> pandas.DataFrame Assign names to products based on their group ID, ensuring duplicates have the same name. .. py:method:: add_metadata(products: pandas.DataFrame) -> pandas.DataFrame .. py:method:: enumerate_stereoisomers(products: pandas.DataFrame) -> pandas.DataFrame This function is used to enumerate the stereoisomers of the products. .. py:method:: find_stereoisomers(smiles: str) -> List[syndirella.error.Chem.Mol] .. py:method:: save_products() This function is used to save the products dataframe as a .pkl file. .. py:method:: label_products() This function makes a new instance of the Labeler class and calls the label_products function.