Skip to content

L003: Cro66 bonding PLS

To investigate the relationship between structural descriptors and the distance between Cro66 OH and His148 HD1, we employed Partial Least Squares (PLS) regression analysis. This multivariate statistical technique was chosen for its ability to handle high-dimensional, correlated data and reveal underlying patterns in complex datasets.

Methodology

Dihedral angles were transformed using the function,

\[ \frac{1 - \cos \left(\theta\right)}{2}. \]

This transformation maps the circular dihedral data to a [0, 1] range, preserving the periodicity while differentiating between cis (0°) and trans (180°) conformations. All input features (\(X\)) were standardized using sklearn's StandardScaler to ensure each feature contributes equally to the model. The distance between Cro66 OH and His148 HD1 (\(y\)) was used as the response variable without scaling.

A PLS regression model was fitted to \(X\) and \(y\) using sklearn's PLSRegression with two components for each simulation state. The model's performance was evaluated using the R² score.

Visualization and Interpretation

Data points were projected onto the space of the first two PLS components. A 2D histogram was created in this space, with bin colors representing the average response variable value.

Loading vectors for each feature were plotted as arrows in the PLS component space. The magnitude and direction of these arrows indicate the importance and relationship of each feature to the PLS components. The magnitude of each feature's loading vector was calculated as the Euclidean norm of its first two PLS components.

A dashed line representing the direction of maximum change in the response variable was added to the plot. This line, referred to as the derivative line, indicates the direction in the PLS component space along which the distance increases most rapidly.

To quantify how each loading vector aligns with the direction of maximum change, we calculated the sine of the angle between each loading vector and the derivative line. A sine value close to 0 indicates that the loading is closely aligned with the derivative line (either in the same or opposite direction), while a value close to 1 or -1 indicates that the loading is perpendicular to the derivative line.

Ideally, we want to identify loadings that correlate to the PLS components parallel to the largest distance variance.

Feature \(\sin \left( \theta \right)\) Magnitude
J: Thr203-N-CA-C Cys204-N dihedral 0.9310 1.6612
A: Asn144-C Tyr145-N-CA-C dihedral -0.4071 -1.5302
N: Ser205-N-CA-C Ala206-N dihedral -0.9543 -1.4592
O: Leu221-C Glu222-N-CA-C dihedral 0.8028 -1.4138
G: Cys147-C His148-N-CA-C dihedral -0.5137 -1.1952
L: Cys204-N-CA-C Ser205-N dihedral 0.3677 -1.1887
U: Cro66-OH - Tyr145-HH distance 0.3363 1.1723
K: Thr203-C Cys204-N-CA-C dihedral 0.9698 1.1614
P: Glu222-N-CA-C Phe223-N dihedral 0.7373 -1.0731
R: Ser205-H - Asn146-O distance 0.9990 1.0476
W: His148-NE2 - Arg168-H distance 0.2706 1.0172
I: Ser202-C Thr203-N-CA-C dihedral 0.8896 1.0124
T: Cro66-OH - Thr203-HG1 distance -0.8065 1.0087
E: Asn146-C Cys147-N-CA-C dihedral -0.9894 0.8477
F: Cys147-N-CA-C His148-N dihedral 0.6001 -0.8343
Q: His148-H - Thr203-O distance 0.9495 0.7787
C: Tyr145-C Asn146-N-CA-C dihedral -0.9114 -0.7601
S: Cys204-O - Phe223-H distance -0.1600 -0.7219
B: Tyr145-N-CA-C Asn146-N dihedral -0.9997 -0.5780
V: His148-HD1 - Asn146-O distance -0.9599 0.5246
H: His148-N-CA-C Asn149-N dihedral 0.2082 0.2798
X: Tyr145-HH - Thr64-OG1 distance 0.1203 0.2535
D: Asn146-N-CA-C Cys147-N dihedral 0.8436 -0.0691
M: Cys204-C Ser205-N-CA-C dihedral 0.9220 0.0587

Feature \(\sin \left( \theta \right)\) Magnitude
E: Asn146-C Cys147-N-CA-C dihedral -0.8379 1.5905
U: Cro66-OH - Tyr145-HH distance 0.8729 1.5843
D: Asn146-N-CA-C Cys147-N dihedral 1.0000 1.5514
V: His148-HD1 - Asn146-O distance -0.5355 1.4431
X: Tyr145-HH - Thr64-OG1 distance -0.7911 1.4213
R: Ser205-H - Asn146-O distance 0.6639 1.2773
G: Cys147-C His148-N-CA-C dihedral -0.9635 -1.1463
O: Leu221-C Glu222-N-CA-C dihedral 0.7011 -1.1329
K: Thr203-C Cys204-N-CA-C dihedral -0.2598 -1.0137
S: Cys204-O - Phe223-H distance 0.9889 -1.0063
H: His148-N-CA-C Asn149-N dihedral -0.1799 -1.0016
T: Cro66-OH - Thr203-HG1 distance 0.5961 -0.9522
P: Glu222-N-CA-C Phe223-N dihedral -0.9967 -0.8803
C: Tyr145-C Asn146-N-CA-C dihedral -0.7290 -0.8721
L: Cys204-N-CA-C Ser205-N dihedral -0.9997 0.7738
J: Thr203-N-CA-C Cys204-N dihedral -0.8546 -0.7686
W: His148-NE2 - Arg168-H distance 0.1865 0.6072
A: Asn144-C Tyr145-N-CA-C dihedral -0.9818 -0.6018
I: Ser202-C Thr203-N-CA-C dihedral -0.9560 -0.5819
M: Cys204-C Ser205-N-CA-C dihedral -1.0000 0.4413
N: Ser205-N-CA-C Ala206-N dihedral -0.8603 -0.4201
B: Tyr145-N-CA-C Asn146-N dihedral -0.1606 -0.3229
F: Cys147-N-CA-C His148-N dihedral 0.9179 0.0357
Q: His148-H - Thr203-O distance -0.4509 -0.0253

Feature \(\sin \left( \theta \right)\) Magnitude
C: Tyr145-C Asn146-N-CA-C dihedral 0.9846 1.8009
O: Leu221-C Glu222-N-CA-C dihedral 0.6911 -1.6410
V: His148-HD1 - Asn146-O distance 0.2593 1.6406
U: Cro66-OH - Tyr145-HH distance -0.6670 1.4045
B: Tyr145-N-CA-C Asn146-N dihedral 0.9997 1.3725
X: Tyr145-HH - Thr64-OG1 distance 0.7332 1.3103
P: Glu222-N-CA-C Phe223-N dihedral -0.9742 -1.2789
A: Asn144-C Tyr145-N-CA-C dihedral 0.2841 -1.1970
S: Cys204-O - Phe223-H distance 0.7806 -1.1213
R: Ser205-H - Asn146-O distance -0.8050 1.0389
E: Asn146-C Cys147-N-CA-C dihedral -0.2461 -0.9456
N: Ser205-N-CA-C Ala206-N dihedral 0.8984 -0.9098
T: Cro66-OH - Thr203-HG1 distance 0.9658 0.9064
I: Ser202-C Thr203-N-CA-C dihedral -0.9942 -0.6602
K: Thr203-C Cys204-N-CA-C dihedral -0.9265 -0.5171
M: Cys204-C Ser205-N-CA-C dihedral -0.8475 -0.4923
J: Thr203-N-CA-C Cys204-N dihedral -0.9987 -0.4732
L: Cys204-N-CA-C Ser205-N dihedral -0.8220 -0.4423
Q: His148-H - Thr203-O distance -0.7661 0.3870
F: Cys147-N-CA-C His148-N dihedral -0.0241 0.3653
H: His148-N-CA-C Asn149-N dihedral -0.8343 -0.3437
W: His148-NE2 - Arg168-H distance 0.0509 -0.2798
G: Cys147-C His148-N-CA-C dihedral 0.9092 0.0739
D: Asn146-N-CA-C Cys147-N dihedral 0.9977 0.0019

In these tables, features are sorted by the absolute value of the sine of the angle, with those closest to 0 appearing at the top. This ordering highlights the features that are most aligned with the direction of maximum change in the response variable. The \(\sin \left( \theta \right)\) column provides a direct measure of alignment: values close to 0 indicate strong alignment with the derivative line, while values close to 1 or -1 indicate perpendicularity. The sign of the sine tells us which side of the derivative line the loading is on.

Comparative Analysis

To compare the influence of structural descriptors across different states:

  1. Loading magnitudes for each feature were compiled across all states (reduced, oxidized, Cu-bound).
  2. Features were sorted based on their median absolute loading magnitude across states.

This analysis allows for the identification of structural descriptors that consistently influence the Cro66 OH - His148 HD1 distance across different protein states, as well as those that show state-specific importance.

Feature (i.e., loading) analysis
Feature Reduced Oxidized Cu
U: Cro66-OH - Tyr145-HH distance 1.1723 1.5843 (+0.4120) 1.4045 (+0.2321)
V: His148-HD1 - Asn146-O distance 0.5246 1.4431 (+0.9185) 1.6406 (+1.1160)
R: Ser205-H - Asn146-O distance 1.0476 1.2773 (+0.2297) 1.0389 (-0.0087)
X: Tyr145-HH - Thr64-OG1 distance 0.2535 1.4213 (+1.1678) 1.3103 (+1.0569)
E: Asn146-C Cys147-N-CA-C dihedral 0.8477 1.5905 (+0.7429) -0.9456 (-1.7933)
D: Asn146-N-CA-C Cys147-N dihedral -0.0691 1.5514 (+1.6205) 0.0019 (+0.0710)
W: His148-NE2 - Arg168-H distance 1.0172 0.6072 (-0.4101) -0.2798 (-1.2971)
Q: His148-H - Thr203-O distance 0.7787 -0.0253 (-0.8040) 0.3870 (-0.3917)
T: Cro66-OH - Thr203-HG1 distance 1.0087 -0.9522 (-1.9610) 0.9064 (-0.1024)
B: Tyr145-N-CA-C Asn146-N dihedral -0.5780 -0.3229 (+0.2551) 1.3725 (+1.9505)
J: Thr203-N-CA-C Cys204-N dihedral 1.6612 -0.7686 (-2.4297) -0.4732 (-2.1344)
C: Tyr145-C Asn146-N-CA-C dihedral -0.7601 -0.8721 (-0.1121) 1.8009 (+2.5610)
M: Cys204-C Ser205-N-CA-C dihedral 0.0587 0.4413 (+0.3826) -0.4923 (-0.5509)
I: Ser202-C Thr203-N-CA-C dihedral 1.0124 -0.5819 (-1.5943) -0.6602 (-1.6726)
K: Thr203-C Cys204-N-CA-C dihedral 1.1614 -1.0137 (-2.1751) -0.5171 (-1.6785)
F: Cys147-N-CA-C His148-N dihedral -0.8343 0.0357 (+0.8700) 0.3653 (+1.1996)
L: Cys204-N-CA-C Ser205-N dihedral -1.1887 0.7738 (+1.9625) -0.4423 (+0.7464)
H: His148-N-CA-C Asn149-N dihedral 0.2798 -1.0016 (-1.2813) -0.3437 (-0.6235)
G: Cys147-C His148-N-CA-C dihedral -1.1952 -1.1463 (+0.0490) 0.0739 (+1.2691)
N: Ser205-N-CA-C Ala206-N dihedral -1.4592 -0.4201 (+1.0391) -0.9098 (+0.5495)
S: Cys204-O - Phe223-H distance -0.7219 -1.0063 (-0.2844) -1.1213 (-0.3994)
P: Glu222-N-CA-C Phe223-N dihedral -1.0731 -0.8803 (+0.1929) -1.2789 (-0.2058)
A: Asn144-C Tyr145-N-CA-C dihedral -1.5302 -0.6018 (+0.9284) -1.1970 (+0.3332)
O: Leu221-C Glu222-N-CA-C dihedral -1.4138 -1.1329 (+0.2809) -1.6410 (-0.2272)

Visualization