Dask-Yarn

Dask-Yarn 将 Dask 部署到 YARN 集群上,例如传统的 Hadoop 安装中常见的集群。Dask-Yarn 提供了一个简单的接口,可以直接从 Python 快速启动、扩展和停止 Dask 集群。

from dask_yarn import YarnCluster
from dask.distributed import Client

# Create a cluster where each worker has two cores and eight GiB of memory
cluster = YarnCluster(environment='environment.tar.gz',
                      worker_vcores=2,
                      worker_memory="8GiB")
# Scale out to ten such workers
cluster.scale(10)

# Connect to the cluster
client = Client(cluster)

Dask-Yarn 使用 Skein,这是一个 Python 化的库,用于创建和部署 YARN 应用程序。

安装

Dask-Yarn 设计为仅需安装在边缘节点上。要安装,请使用以下方法之一

使用 Conda 安装

conda install -c conda-forge dask-yarn

使用 Pip 安装

pip install dask-yarn

从源代码安装

Dask-Yarn 可在 github 上获取,并始终可以从源代码安装。

pip install git+https://github.com/dask/dask-yarn.git