X Tutup
Skip to content

Airflow task failed but spark kube app is running #63243

@rcrchawla

Description

@rcrchawla

Body

Airflow task got failed where spark kube app is running. Although spark kube app is long running app most probably around 1-2 hour. And there are concurrently many task running at the same time usually it happens between 02:30 am - 03:45 am UTC.

Q) What causing issue ?
A) Airflow task failed while spark kube app running

Airflow version -- 3.0.4

Setup config
2 API servers
2 workers
1 dag processor
2 schedulers

Deployment --> HELM Chart deployment on Azure Kubernetes

Please check below logs

Worker logs :

2026-03-10 02:33:56.191330 [info ] Task execute_workload[8cbabf91-009f-44a6-86d1-bef109c70341] succeeded in 2715.019189195242s: None [celery.app.trace]
2026-03-10 02:39:57.112078 [info ] Task finished [supervisor] duration=1723.7576029417105 exit_code=0 final_state=success
2026-03-10 02:39:57.128929 [info ] Task execute_workload[9b3f27ec-09b5-424e-8d5c-412e541f51e8] succeeded in 1723.8186896019615s: None [celery.app.trace]
2026-03-10 02:40:50.688403 [info ] Task finished [supervisor] duration=744.0669570546597 exit_code=0 final_state=success
2026-03-10 02:40:50.705538 [info ] Task execute_workload[b08ac31a-2ee7-4029-b897-753157b18475] succeeded in 744.139388079755s: None [celery.app.trace]
2026-03-10 02:42:11.649891 [info ] Task finished [supervisor] duration=756.7588595808484 exit_code=0 final_state=success
2026-03-10 02:42:11.666368 [info ] Task execute_workload[0351c271-194e-4e58-87e4-a9c224351ab1] succeeded in 756.8229349320754s: None [celery.app.trace]
2026-03-10 02:43:37.239128 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:38.119304 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:38.640468 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:39.247588 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:39.425843 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:39.618220 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:40.002999 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:40.582177 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:41.186771 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:41.510710 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:42.658853 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:43.171303 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:43.826966 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:44.330891 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:44.874859 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:44.922591 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:45.866775 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:46.194974 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:46.482845 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:46.750792 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:48.198838 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:48.462121 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:49.749467 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:50.029438 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:50.834835 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:51.334847 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:51.431052 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:51.537615 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:52.567197 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:52.967177 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:53.615078 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:54.513959 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:56.442819 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:57.527549 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:57.765172 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:57.982839 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:58.099625 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:58.534632 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:59.007106 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:43:59.947380 [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client]
2026-03-10 02:44:02.200313 [warning ] Failed to send heartbeat. Will be retried [supervisor] failed_heartbeats=1 max_retries=3 ti_id=UUID('019cd54c-28b0-7e18-9a7b-71ba469bf545')

API Server

2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=155023 state=running ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local current_pid=81402 state=running ti_id=019cd578-f8c1-7125-9906-ef64229dbba5
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local current_pid=86154 state=running ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
INFO: 10.10.12.52:40870 - "GET /api/v2/version HTTP/1.1" 200 OK
INFO: 10.10.12.52:40880 - "GET /api/v2/version HTTP/1.1" 200 OK
2026-03-10 02:45:23 [debug ] Processing heartbeat hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local pid=151395 ti_id=019cd542-0d47-7d93-a021-0cc2c9de7344
2026-03-10 02:45:23 [debug ] Refreshed token issued to Task [airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 valid_left=73
2026-03-10 02:45:23 [debug ] Refreshed token issued to Task [airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 valid_left=73
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Processing heartbeat hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local pid=155023 ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
[2026-03-10T02:45:23.575+0000] {exceptions.py:77} ERROR - Error with id 9zBmdizJ
File "/home/airflow/.local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/home/airflow/.local/lib/python3.12/site-packages/starlette/routing.py", line 75, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line 302, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line 213, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py", line 474, in decorator
response = await self._convert_endpoint_response_to_version(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py", line 520, in _convert_endpoint_response_to_version
response_or_response_body: Union[FastapiResponse, object] = await run_in_threadpool(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/starlette/concurrency.py", line 38, in run_in_threadpool
return await anyio.to_thread.run_sync(func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2476, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 967, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/cadwyn/schema_generation.py", line 515, in call
return self._original_callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/execution_api/routes/xcoms.py", line 419, in set_xcom
session.flush()
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3449, in flush
self.flush(objects)
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3588, in flush
with util.safe_reraise():
^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise
(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", line 211, in raise

raise exception
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3549, in _flush
flush_context.execute()
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
rec.execute(self)
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
util.preloaded.orm_persistence.save_obj(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
_emit_insert_statements(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 1097, in _emit_insert_statements
c = connection._execute_20(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
self.handle_dbapi_exception(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2134, in handle_dbapi_exception
util.raise
(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", line 211, in raise

raise exception
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 179, in execute
res = self._query(mogrified_query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 330, in _query
db.query(q)
File "/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/connections.py", line 280, in query
_mysql.connection.query(self, query)

2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=65618 state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=151858 state=running ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug ] Retrieved current task state current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017

What you think should happen instead?

Airflow task should run without getting failed.

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugkind:metaHigh-level information important to the community

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      X Tutup