Révision 4e82c425 modules/clustering.py
b/modules/clustering.py | ||
---|---|---|
1 | 1 |
"""Functions to cluster structures. |
2 | 2 |
|
3 | 3 |
functions: |
4 |
get_rmsd: Computes the rmsd matrix of the conformers in a list of rdkit mol |
|
5 |
objects. |
|
4 | 6 |
get_labels_affty: Clusters data in affinity matrix form by assigning labels to |
5 |
data points. |
|
7 |
data points.
|
|
6 | 8 |
get_labels_vector: Clusters data in vectorial form by assigning labels to |
7 |
data points. |
|
9 |
data points.
|
|
8 | 10 |
get_clusters: Groups data-points belonging to the same cluster into arrays of |
9 |
indices. |
|
11 |
indices.
|
|
10 | 12 |
get_exemplars_affty: Computes the exemplars for every cluster and returns a list |
11 |
of indices. |
|
13 |
of indices.
|
|
12 | 14 |
plot_clusters: Plots the clustered data casting a color to every cluster. |
13 | 15 |
clustering: Directs the clustering process by calling the relevant functions. |
14 | 16 |
""" |
... | ... | |
21 | 23 |
|
22 | 24 |
|
23 | 25 |
def get_rmsd(mol_list: list, remove_Hs="c"): |
24 |
"""Computes the rmsd matrix of the conformers in a rdkit mol object.
|
|
26 |
"""Computes the rmsd matrix of the conformers in a list of rdkit mol objects
|
|
25 | 27 |
|
26 | 28 |
@param mol_list: list of rdkit mol objects containing the conformers. |
27 | 29 |
@param remove_Hs: bool or str, |
... | ... | |
92 | 94 |
@return: list of cluster labels. Every data point is assigned a number |
93 | 95 |
corresponding to the cluster it belongs to. |
94 | 96 |
""" |
97 |
# TODO Implement it. |
|
95 | 98 |
return [] |
96 | 99 |
|
97 | 100 |
|
... | ... | |
115 | 118 |
distances between points, RMSD Matrix, etc.) shape: [n_points, n_points]. |
116 | 119 |
@param clusters: tuple of arrays. Every array contains the indices (relative |
117 | 120 |
to the affinity matrix) of the data points belonging to the same cluster. |
118 |
@return: list of indices (relative to the affinity matrix) exemplars for |
|
119 |
every cluster. |
|
121 |
@return: list of indices (relative to the affinity matrix) of the exemplars |
|
122 |
for every cluster. |
|
123 |
|
|
124 |
This function finds the exemplars of already clusterized data. It does |
|
125 |
that by (i) building a rmsd matrix for each existing cluster with the values |
|
126 |
of the total RMSD matrix (ii) carrying out an actual clustering for each |
|
127 |
cluster-specific matrix using a set of parameters (large negative value of |
|
128 |
preference) such that it always finds only one cluster and (iii) it then |
|
129 |
calculates the exemplar for the matrix. |
|
120 | 130 |
""" |
121 | 131 |
from sklearn.cluster import AffinityPropagation |
132 |
# Splits Total RMSD matrix into cluster-specific RMSD matrices. |
|
122 | 133 |
clust_affty_mtcs = tuple(affty_mtx[np.ix_(clust, clust)] |
123 | 134 |
for clust in clusters) |
124 | 135 |
exemplars = [] |
136 |
# Carries out the forced-to-converge-to-1 clustering for each already |
|
137 |
# existing cluster rmsd matrix and calculates the exemplar. |
|
125 | 138 |
for i, mtx in enumerate(clust_affty_mtcs): |
126 | 139 |
pref = -1e6 * np.max(np.abs(mtx)) |
127 | 140 |
af = AffinityPropagation(affinity='precomputed', preference=pref, |
Formats disponibles : Unified diff