Révision 4e82c425 modules/clustering.py

b/modules/clustering.py
1 1
"""Functions to cluster structures.
2 2

  
3 3
functions:
4
get_rmsd: Computes the rmsd matrix of the conformers in a list of rdkit mol
5
    objects.
4 6
get_labels_affty: Clusters data in affinity matrix form by assigning labels to
5
data points.
7
    data points.
6 8
get_labels_vector: Clusters data in vectorial form by assigning labels to
7
data points.
9
    data points.
8 10
get_clusters: Groups data-points belonging to the same cluster into arrays of
9
indices.
11
    indices.
10 12
get_exemplars_affty: Computes the exemplars for every cluster and returns a list
11
of indices.
13
    of indices.
12 14
plot_clusters: Plots the clustered data casting a color to every cluster.
13 15
clustering: Directs the clustering process by calling the relevant functions.
14 16
"""
......
21 23

  
22 24

  
23 25
def get_rmsd(mol_list: list, remove_Hs="c"):
24
    """Computes the rmsd matrix of the conformers in a rdkit mol object.
26
    """Computes the rmsd matrix of the conformers in a list of rdkit mol objects
25 27

  
26 28
    @param mol_list: list of rdkit mol objects containing the conformers.
27 29
    @param remove_Hs: bool or str,
......
92 94
    @return: list of cluster labels. Every data point is assigned a number
93 95
    corresponding to the cluster it belongs to.
94 96
    """
97
    # TODO Implement it.
95 98
    return []
96 99

  
97 100

  
......
115 118
    distances between points, RMSD Matrix, etc.) shape: [n_points, n_points].
116 119
    @param clusters: tuple of arrays. Every array contains the indices (relative
117 120
    to the affinity matrix) of the data points belonging to the same cluster.
118
    @return: list of indices (relative to the affinity matrix) exemplars for
119
    every cluster.
121
    @return: list of indices (relative to the affinity matrix) of the exemplars
122
    for every cluster.
123

  
124
    This function finds the exemplars of already clusterized data. It does
125
    that by (i) building a rmsd matrix for each existing cluster with the values
126
    of the total RMSD matrix (ii) carrying out an actual clustering for each
127
    cluster-specific matrix using a set of parameters (large negative value of
128
    preference) such that it always finds only one cluster and (iii) it then
129
    calculates the exemplar for the matrix.
120 130
    """
121 131
    from sklearn.cluster import AffinityPropagation
132
    # Splits Total RMSD matrix into cluster-specific RMSD matrices.
122 133
    clust_affty_mtcs = tuple(affty_mtx[np.ix_(clust, clust)]
123 134
                             for clust in clusters)
124 135
    exemplars = []
136
    # Carries out the forced-to-converge-to-1 clustering for each already
137
    # existing cluster rmsd matrix and calculates the exemplar.
125 138
    for i, mtx in enumerate(clust_affty_mtcs):
126 139
        pref = -1e6 * np.max(np.abs(mtx))
127 140
        af = AffinityPropagation(affinity='precomputed', preference=pref,

Formats disponibles : Unified diff