OCPBUGS-45921: Use HighlyAvailable infra policy for HyperShift serial conformance tests#75813
Conversation
… conformance tests The e2e-aws-ovn-conformance-serial test creates a HyperShift hosted cluster with 3 worker nodes but SingleReplica infrastructure topology (the default). This causes the ingress controller to run with only 1 replica, making it vulnerable to NoExecute taint eviction from serial conformance tests like kubectl taint [1]. Switching to HighlyAvailable ensures 2 router replicas so a single-node taint doesn't cause full ingress unavailability. [1] https://github.com/kubernetes/kubernetes/blob/8911a2d/test/e2e/kubectl/kubectl.go#L1772
|
@alebedev87: This pull request references Jira Issue OCPBUGS-45921, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@alebedev87: This pull request references Jira Issue OCPBUGS-45921, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@alebedev87: This pull request references Jira Issue OCPBUGS-45921, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[REHEARSALNOTIFIER]
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals. Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse |
|
@alebedev87: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87, jparrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
4.22 cluster Tests passed. status:
controlPlaneTopology: External
cpuPartitioning: None
infrastructureName: c7ed7a3834177cec7638
infrastructureTopology: HighlyAvailable name: router-default
namespace: openshift-ingress
spec:
replicas: 24.21 cluster The same - tests passed, router deployment has 2 replicas. 4.20 and 4.19 clusters Routyer deployment is HA (2 replicas) however some tests failed. Not the ones which this PR aims to fix though: /pj-rehearse periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance-serial |
|
@alebedev87: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@alebedev87: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command. |
|
/pj-rehearse periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance-serial |
|
@alebedev87: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance-serial periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance-serial |
|
@alebedev87: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
This test is skipped on 4.22 and 4.21 however on 4.20 and 4.19 the status of the authentication operator is probed before the skip kicks in. Looking at the error, seems like authentication operator is not present, should the skip for 4.20 and 4.19 be moved at the top of the testcase (related PR)? |
Seems like KSVM catches the same problem as router: cc @sanchezl |
|
4.20 cluster Both of the blocking failures for the 4.20 don't seem to be related to this change. |
This test failed due to rate limiting from registry.ci.openshift.org image registry, from Claude Code analysis: |
|
Conclusion: I don't see any evident link between the failed tests and this PR. For some things I will follow up (e.g. openshift/origin#30848) but overall acknowledge the rehearsal.. /pj-rehearse ack |
|
@alebedev87: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse ack |
|
@alebedev87: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@alebedev87: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
9d47214
into
openshift:main
|
@alebedev87: Jira Issue OCPBUGS-45921: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-45921 has not been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
… conformance tests (openshift#75813) The e2e-aws-ovn-conformance-serial test creates a HyperShift hosted cluster with 3 worker nodes but SingleReplica infrastructure topology (the default). This causes the ingress controller to run with only 1 replica, making it vulnerable to NoExecute taint eviction from serial conformance tests like kubectl taint [1]. Switching to HighlyAvailable ensures 2 router replicas so a single-node taint doesn't cause full ingress unavailability. [1] https://github.com/kubernetes/kubernetes/blob/8911a2d/test/e2e/kubectl/kubectl.go#L1772
… conformance tests (openshift#75813) The e2e-aws-ovn-conformance-serial test creates a HyperShift hosted cluster with 3 worker nodes but SingleReplica infrastructure topology (the default). This causes the ingress controller to run with only 1 replica, making it vulnerable to NoExecute taint eviction from serial conformance tests like kubectl taint [1]. Switching to HighlyAvailable ensures 2 router replicas so a single-node taint doesn't cause full ingress unavailability. [1] https://github.com/kubernetes/kubernetes/blob/8911a2d/test/e2e/kubectl/kubectl.go#L1772
The e2e-aws-ovn-conformance-serial test creates a HyperShift hosted cluster with 3 worker nodes but SingleReplica infrastructure topology (the default). This causes the ingress controller to run with only 1 replica, making it vulnerable to NoExecute taint eviction from serial conformance tests like kubectl taint [1]. Switching to HighlyAvailable ensures 2 router replicas so a single-node taint doesn't cause full ingress unavailability.
[1] https://github.com/kubernetes/kubernetes/blob/8911a2d/test/e2e/kubectl/kubectl.go#L1772
Investigation details: link.