./test_hdfs.shInitial integration testing with Dask has been Dockerized.
To invoke the test run the following command in the arrow
root-directory:
bash dev/dask_integration.shThis script will create a dask directory on the same level as
arrow. It will clone the Dask project from Github into dask
and do a Python --user install. The Docker code will use the parent
directory of arrow as $HOME and that's where Python will
install dask into a .local directory.
The output of the Docker session will contain the results of tests
of the Dask dataframe followed by the single integration test that
now exists for Arrow. That test creates a set of csv-files and then
does parallel reading of csv-files into a Dask dataframe. The code
for this test resides here in the dask_test directory.