I011: Cro66 bonding PLS¶
To investigate the relationship between structural descriptors and the distance between Cro66 OH and His148 HD1, we employed Partial Least Squares (PLS) regression analysis. This multivariate statistical technique was chosen for its ability to handle high-dimensional, correlated data and reveal underlying patterns in complex datasets.
Methodology¶
Dihedral angles were transformed using the function,
This transformation maps the circular dihedral data to a [0, 1] range, preserving the periodicity while differentiating between cis (0°) and trans (180°) conformations. All input features (\(X\)) were standardized using sklearn's StandardScaler to ensure each feature contributes equally to the model. The distance between Cro66 OH and His148 HD1 (\(y\)) was used as the response variable without scaling.
A PLS regression model was fitted to \(X\) and \(y\) using sklearn's PLSRegression with two components for each simulation state.
The model's performance was evaluated using the R² score.
Visualization and Interpretation¶
Data points were projected onto the space of the first two PLS components. A 2D histogram was created in this space, with bin colors representing the average response variable value.
Loading vectors for each feature were plotted as arrows in the PLS component space. The magnitude and direction of these arrows indicate the importance and relationship of each feature to the PLS components. The magnitude of each feature's loading vector was calculated as the Euclidean norm of its first two PLS components.
A dashed line representing the direction of maximum change in the response variable was added to the plot. This line, referred to as the derivative line, indicates the direction in the PLS component space along which the distance increases most rapidly.
To quantify how each loading vector aligns with the direction of maximum change, we calculated the sine of the angle between each loading vector and the derivative line. A sine value close to 0 indicates that the loading is closely aligned with the derivative line (either in the same or opposite direction), while a value close to 1 or -1 indicates that the loading is perpendicular to the derivative line.
Ideally, we want to identify loadings that correlate to the PLS components parallel to the largest distance variance.
| Feature | \(\sin \left( \theta \right)\) | Magnitude | 
|---|---|---|
| M: Thr203-N-CA-C Cys204-N dihedral | -0.9770 | 1.7662 | 
| R: Leu221-C Glu222-N-CA-C dihedral | -0.1096 | -1.6162 | 
| W: Cro66-OH - Thr203-HG1 distance | 0.9998 | 1.5014 | 
| S: Glu222-N-CA-C Phe223-N dihedral | 0.5440 | -1.4845 | 
| C: Tyr145-HH - Thr64-OG1 distance | -0.2227 | 1.4841 | 
| X: Cro66-OH - His148-HD1 distance | -0.7383 | 1.4357 | 
| D: Asn144-C Tyr145-N-CA-C dihedral | -0.7456 | -1.3634 | 
| L: Ser202-C Thr203-N-CA-C dihedral | -0.9847 | 1.2976 | 
| P: Cys204-C Ser205-N-CA-C dihedral | -0.6848 | -1.2198 | 
| N: Thr203-C Cys204-N-CA-C dihedral | -0.9586 | 1.1434 | 
| I: Cys147-N-CA-C His148-N dihedral | 0.9889 | -1.0921 | 
| Q: Ser205-N-CA-C Ala206-N dihedral | 0.8383 | -0.8675 | 
| T: His148-H - Thr203-O distance | -0.9833 | 0.8315 | 
| O: Cys204-N-CA-C Ser205-N dihedral | -0.9133 | -0.7882 | 
| V: Cys204-O - Phe223-H distance | -0.9102 | -0.6971 | 
| Z: His148-NE2 - Arg168-H distance | 0.7400 | 0.5942 | 
| J: Cys147-C His148-N-CA-C dihedral | -0.0668 | -0.5151 | 
| K: His148-N-CA-C Asn149-N dihedral | -0.8428 | 0.5064 | 
| E: Tyr145-N-CA-C Asn146-N dihedral | 0.7560 | -0.5023 | 
| G: Asn146-N-CA-C Cys147-N dihedral | -0.3201 | -0.3549 | 
| B: Tyr145-CE1-CZ-OH-HH dihedral | -0.8501 | 0.2929 | 
| H: Asn146-C Cys147-N-CA-C dihedral | 0.8551 | -0.2089 | 
| Y: His148-HD1 - Asn146-O distance | 0.9999 | 0.1932 | 
| A: Tyr145-CA-CB-CG-CD1 dihedral | 0.9687 | -0.1789 | 
| U: Ser205-H - Asn146-O distance | -0.9461 | -0.0962 | 
| F: Tyr145-C Asn146-N-CA-C dihedral | 0.4033 | -0.0614 | 
| Feature | \(\sin \left( \theta \right)\) | Magnitude | 
|---|---|---|
| C: Tyr145-HH - Thr64-OG1 distance | 0.1041 | 2.1695 | 
| Y: His148-HD1 - Asn146-O distance | -0.9898 | 1.9428 | 
| H: Asn146-C Cys147-N-CA-C dihedral | -0.9527 | 1.7753 | 
| U: Ser205-H - Asn146-O distance | 0.9762 | 1.5866 | 
| F: Tyr145-C Asn146-N-CA-C dihedral | -0.9993 | -1.2434 | 
| R: Leu221-C Glu222-N-CA-C dihedral | -0.4066 | -1.2084 | 
| X: Cro66-OH - His148-HD1 distance | 0.9237 | 1.1834 | 
| K: His148-N-CA-C Asn149-N dihedral | 0.9484 | -0.9881 | 
| V: Cys204-O - Phe223-H distance | 0.9638 | -0.9708 | 
| G: Asn146-N-CA-C Cys147-N dihedral | 0.8313 | 0.9624 | 
| D: Asn144-C Tyr145-N-CA-C dihedral | -0.8304 | -0.9555 | 
| S: Glu222-N-CA-C Phe223-N dihedral | -0.8508 | -0.9437 | 
| J: Cys147-C His148-N-CA-C dihedral | 1.0000 | -0.8975 | 
| E: Tyr145-N-CA-C Asn146-N dihedral | -0.8186 | -0.7204 | 
| I: Cys147-N-CA-C His148-N dihedral | 0.9614 | -0.6858 | 
| Z: His148-NE2 - Arg168-H distance | -0.9907 | 0.5309 | 
| T: His148-H - Thr203-O distance | 0.9830 | -0.4788 | 
| L: Ser202-C Thr203-N-CA-C dihedral | 0.6710 | -0.3990 | 
| W: Cro66-OH - Thr203-HG1 distance | -0.2302 | 0.3300 | 
| Q: Ser205-N-CA-C Ala206-N dihedral | -0.9815 | -0.3283 | 
| P: Cys204-C Ser205-N-CA-C dihedral | -0.7877 | -0.2380 | 
| N: Thr203-C Cys204-N-CA-C dihedral | 0.7285 | -0.1597 | 
| A: Tyr145-CA-CB-CG-CD1 dihedral | -0.4846 | -0.1539 | 
| M: Thr203-N-CA-C Cys204-N dihedral | 0.7229 | -0.0721 | 
| B: Tyr145-CE1-CZ-OH-HH dihedral | 0.3404 | -0.0397 | 
| O: Cys204-N-CA-C Ser205-N dihedral | -0.8355 | 0.0022 | 
| Feature | \(\sin \left( \theta \right)\) | Magnitude | 
|---|---|---|
| R: Leu221-C Glu222-N-CA-C dihedral | -0.0269 | -1.8733 | 
| U: Ser205-H - Asn146-O distance | 0.9800 | -1.6472 | 
| T: His148-H - Thr203-O distance | -0.5928 | -1.6067 | 
| M: Thr203-N-CA-C Cys204-N dihedral | 0.8537 | 1.5975 | 
| B: Tyr145-CE1-CZ-OH-HH dihedral | 0.5403 | 1.4338 | 
| N: Thr203-C Cys204-N-CA-C dihedral | 0.8987 | 1.3967 | 
| H: Asn146-C Cys147-N-CA-C dihedral | 0.3630 | -1.2444 | 
| O: Cys204-N-CA-C Ser205-N dihedral | 0.7764 | 1.1729 | 
| C: Tyr145-HH - Thr64-OG1 distance | -0.4626 | 1.0371 | 
| P: Cys204-C Ser205-N-CA-C dihedral | 0.7871 | 1.0242 | 
| L: Ser202-C Thr203-N-CA-C dihedral | 0.8581 | 0.9051 | 
| Q: Ser205-N-CA-C Ala206-N dihedral | 0.1890 | -0.8363 | 
| D: Asn144-C Tyr145-N-CA-C dihedral | 0.4161 | -0.8011 | 
| A: Tyr145-CA-CB-CG-CD1 dihedral | -0.9481 | -0.6791 | 
| S: Glu222-N-CA-C Phe223-N dihedral | 0.8944 | -0.6734 | 
| X: Cro66-OH - His148-HD1 distance | -0.6039 | 0.6695 | 
| Z: His148-NE2 - Arg168-H distance | 0.1247 | -0.6175 | 
| F: Tyr145-C Asn146-N-CA-C dihedral | -0.8055 | 0.6097 | 
| V: Cys204-O - Phe223-H distance | -0.7753 | -0.5989 | 
| W: Cro66-OH - Thr203-HG1 distance | -0.9891 | 0.5725 | 
| E: Tyr145-N-CA-C Asn146-N dihedral | -0.9875 | 0.4345 | 
| K: His148-N-CA-C Asn149-N dihedral | 0.9761 | 0.3839 | 
| G: Asn146-N-CA-C Cys147-N dihedral | 0.0028 | -0.3680 | 
| I: Cys147-N-CA-C His148-N dihedral | 0.8815 | -0.2949 | 
| Y: His148-HD1 - Asn146-O distance | 0.6020 | 0.1852 | 
| J: Cys147-C His148-N-CA-C dihedral | -0.3791 | -0.1818 | 
In these tables, features are sorted by the absolute value of the sine of the angle, with those closest to 0 appearing at the top. This ordering highlights the features that are most aligned with the direction of maximum change in the response variable. The \(\sin \left( \theta \right)\) column provides a direct measure of alignment: values close to 0 indicate strong alignment with the derivative line, while values close to 1 or -1 indicate perpendicularity. The sign of the sine tells us which side of the derivative line the loading is on.
Comparative Analysis¶
To compare the influence of structural descriptors across different states:
- Loading magnitudes for each feature were compiled across all states (reduced, oxidized, Cu-bound).
- Features were sorted based on their median absolute loading magnitude across states.
This analysis allows for the identification of structural descriptors that consistently influence the Cro66 OH - His148 HD1 distance across different protein states, as well as those that show state-specific importance.
Feature (i.e., loading) analysis
| Feature | Reduced | Oxidized | Cu | 
|---|---|---|---|
| C: Tyr145-HH - Thr64-OG1 distance | 1.4841 | 2.1695 (+0.6854) | 1.0371 (-0.4470) | 
| M: Thr203-N-CA-C Cys204-N dihedral | 1.7662 | -0.0721 (-1.8383) | 1.5975 (-0.1686) | 
| X: Cro66-OH - His148-HD1 distance | 1.4357 | 1.1834 (-0.2523) | 0.6695 (-0.7661) | 
| W: Cro66-OH - Thr203-HG1 distance | 1.5014 | 0.3300 (-1.1714) | 0.5725 (-0.9289) | 
| N: Thr203-C Cys204-N-CA-C dihedral | 1.1434 | -0.1597 (-1.3031) | 1.3967 (+0.2533) | 
| Y: His148-HD1 - Asn146-O distance | 0.1932 | 1.9428 (+1.7496) | 0.1852 (-0.0080) | 
| L: Ser202-C Thr203-N-CA-C dihedral | 1.2976 | -0.3990 (-1.6966) | 0.9051 (-0.3925) | 
| B: Tyr145-CE1-CZ-OH-HH dihedral | 0.2929 | -0.0397 (-0.3326) | 1.4338 (+1.1409) | 
| Z: His148-NE2 - Arg168-H distance | 0.5942 | 0.5309 (-0.0633) | -0.6175 (-1.2117) | 
| O: Cys204-N-CA-C Ser205-N dihedral | -0.7882 | 0.0022 (+0.7904) | 1.1729 (+1.9611) | 
| H: Asn146-C Cys147-N-CA-C dihedral | -0.2089 | 1.7753 (+1.9842) | -1.2444 (-1.0356) | 
| G: Asn146-N-CA-C Cys147-N dihedral | -0.3549 | 0.9624 (+1.3173) | -0.3680 (-0.0131) | 
| K: His148-N-CA-C Asn149-N dihedral | 0.5064 | -0.9881 (-1.4945) | 0.3839 (-0.1225) | 
| U: Ser205-H - Asn146-O distance | -0.0962 | 1.5866 (+1.6828) | -1.6472 (-1.5510) | 
| P: Cys204-C Ser205-N-CA-C dihedral | -1.2198 | -0.2380 (+0.9818) | 1.0242 (+2.2440) | 
| F: Tyr145-C Asn146-N-CA-C dihedral | -0.0614 | -1.2434 (-1.1821) | 0.6097 (+0.6710) | 
| E: Tyr145-N-CA-C Asn146-N dihedral | -0.5023 | -0.7204 (-0.2181) | 0.4345 (+0.9369) | 
| A: Tyr145-CA-CB-CG-CD1 dihedral | -0.1789 | -0.1539 (+0.0250) | -0.6791 (-0.5002) | 
| T: His148-H - Thr203-O distance | 0.8315 | -0.4788 (-1.3103) | -1.6067 (-2.4382) | 
| J: Cys147-C His148-N-CA-C dihedral | -0.5151 | -0.8975 (-0.3824) | -0.1818 (+0.3333) | 
| Q: Ser205-N-CA-C Ala206-N dihedral | -0.8675 | -0.3283 (+0.5392) | -0.8363 (+0.0312) | 
| I: Cys147-N-CA-C His148-N dihedral | -1.0921 | -0.6858 (+0.4063) | -0.2949 (+0.7972) | 
| V: Cys204-O - Phe223-H distance | -0.6971 | -0.9708 (-0.2736) | -0.5989 (+0.0982) | 
| S: Glu222-N-CA-C Phe223-N dihedral | -1.4845 | -0.9437 (+0.5409) | -0.6734 (+0.8111) | 
| D: Asn144-C Tyr145-N-CA-C dihedral | -1.3634 | -0.9555 (+0.4079) | -0.8011 (+0.5623) | 
| R: Leu221-C Glu222-N-CA-C dihedral | -1.6162 | -1.2084 (+0.4078) | -1.8733 (-0.2570) | 


