The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the Python DB API 2.0 specification.
This connector uses Arrow as the data-exchange format, and supports APIs (e.g. fetchmany_arrow) to directly fetch Arrow tables. Arrow tables are wrapped in the ArrowQueue class to provide a natural API to get several rows at a time. PyArrow is required to enable this and use these APIs, you can install it via pip install pyarrow or pip install databricks-sql-connector[pyarrow].
The connector includes built-in support for HTTP/HTTPS proxy servers with multiple authentication methods including basic authentication and Kerberos/Negotiate authentication. See docs/proxy.md and examples/proxy_authentication.py for details.
You are welcome to file an issue here for general use cases. You can also contact Databricks Support here.
Python 3.9 or above is required.
For the latest documentation, see
Install using pip install databricks-sql-connector
Install using pip install databricks-sql-connector[pyarrow]
Install using pip install databricks-sql-connector[async]
Install using pip install databricks-sql-connector[all]
export DATABRICKS_HOST=********.databricks.com
export DATABRICKS_HTTP_PATH=/sql/1.0/endpoints/****************Example usage:
import os
from databricks import sql
host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
connection = sql.connect(
server_hostname=host,
http_path=http_path)
cursor = connection.cursor()
cursor.execute('SELECT :param `p`, * FROM RANGE(10)', {"param": "foo"})
result = cursor.fetchall()
for row in result:
print(row)
cursor.close()
connection.close()In the above example:
server-hostnameis the Databricks instance host name.http-pathis the HTTP Path either to a Databricks SQL endpoint (e.g. /sql/1.0/endpoints/1234567890abcdef), or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/1234567890123456/1234-123456-slid123)
Note: This example uses Databricks OAuth U2M to authenticate the target Databricks user account and needs to open the browser for authentication. So it can only run on the user's machine.
The connector provides true async/await support for non-blocking database operations. This is useful for high-concurrency applications and async frameworks like FastAPI, aiohttp, etc.
pip install databricks-sql-connector[async]import asyncio
from databricks import sql
async def main():
# async_connect returns an open connection
async with await sql.async_connect(
server_hostname=os.getenv("DATABRICKS_HOST"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
access_token=os.getenv("DATABRICKS_TOKEN"),
) as connection:
async with connection.cursor() as cursor:
# Execute query asynchronously
await cursor.execute("SELECT * FROM my_table LIMIT 100")
# Fetch results asynchronously
rows = await cursor.fetchall()
for row in rows:
print(row)
# Async iteration over results
await cursor.execute("SELECT * FROM large_table")
async for row in cursor:
process(row)
asyncio.run(main())- True async I/O: Uses
aiohttpfor non-blocking HTTP operations - Non-blocking polling: Uses
asyncio.sleep()instead of blockingtime.sleep() - Full API support: Async versions of
execute(),fetchone(),fetchmany(),fetchall(),fetchall_arrow() - Context manager support:
async withfor connections and cursors - Async iteration:
async for row in cursor - Backward compatible: Existing sync API unchanged
Note: Async support currently works with SEA (Statement Execution API) backend, which is used for SQL warehouses.
The connector supports multi-statement transactions with manual commit/rollback control. Set connection.autocommit = False to disable autocommit mode, then use connection.commit() and connection.rollback() to control transactions.
For detailed documentation, examples, and best practices, see TRANSACTIONS.md.
Starting from databricks-sql-connector version 4.0.0 SQLAlchemy support has been extracted to a new library databricks-sqlalchemy.
- Github repository databricks-sqlalchemy github
- PyPI databricks-sqlalchemy pypi
Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core
- Install the latest SQLAlchemy v1 using
pip install databricks-sqlalchemy~=1.0 - Install SQLAlchemy v2 using
pip install databricks-sqlalchemy
See CONTRIBUTING.md