/modules/clustering.py - Diff - DockOnSurf - Forge du Centre Blaise Pascal

Révision 4e82c425 modules/clustering.py

     """Functions to cluster structures.
     functions:
     get_rmsd: Computes the rmsd matrix of the conformers in a list of rdkit mol
         objects.
     get_labels_affty: Clusters data in affinity matrix form by assigning labels to
     data points.
         data points.
     get_labels_vector: Clusters data in vectorial form by assigning labels to
     data points.
         data points.
     get_clusters: Groups data-points belonging to the same cluster into arrays of
     indices.
         indices.
     get_exemplars_affty: Computes the exemplars for every cluster and returns a list
     of indices.
         of indices.
     plot_clusters: Plots the clustered data casting a color to every cluster.
     clustering: Directs the clustering process by calling the relevant functions.
     """
-...
     def get_rmsd(mol_list: list, remove_Hs="c"):
         """Computes the rmsd matrix of the conformers in a rdkit mol object.
         """Computes the rmsd matrix of the conformers in a list of rdkit mol objects
         @param mol_list: list of rdkit mol objects containing the conformers.
         @param remove_Hs: bool or str,
-...
         @return: list of cluster labels. Every data point is assigned a number
         corresponding to the cluster it belongs to.
         """
         # TODO Implement it.
         return []
-...
         distances between points, RMSD Matrix, etc.) shape: [n_points, n_points].
         @param clusters: tuple of arrays. Every array contains the indices (relative
         to the affinity matrix) of the data points belonging to the same cluster.
         @return: list of indices (relative to the affinity matrix) exemplars for
         every cluster.
         @return: list of indices (relative to the affinity matrix) of the exemplars
         for every cluster.
         This function finds the exemplars of already clusterized data. It does
         that by (i) building a rmsd matrix for each existing cluster with the values
         of the total RMSD matrix (ii) carrying out an actual clustering for each
         cluster-specific matrix using a set of parameters (large negative value of
         preference) such that it always finds only one cluster and (iii) it then
         calculates the exemplar for the matrix.
         """
         from sklearn.cluster import AffinityPropagation
         # Splits Total RMSD matrix into cluster-specific RMSD matrices.
         clust_affty_mtcs = tuple(affty_mtx[np.ix_(clust, clust)]
                                  for clust in clusters)
         exemplars = []
         # Carries out the forced-to-converge-to-1 clustering for each already
         # existing cluster rmsd matrix and calculates the exemplar.
         for i, mtx in enumerate(clust_affty_mtcs):
             pref = -1e6 * np.max(np.abs(mtx))
             af = AffinityPropagation(affinity='precomputed', preference=pref,

Formats disponibles : Unified diff

Chimie Théorique » scripts_chimie4psmn » DockOnSurf

Révision 4e82c425 modules/clustering.py