mesos version: 0.21.0
spark version: 1.6.0
OS version: CentOS 7.1.1053
查看操作系统版本:lsb_release -a

我们根据Mesos官网spark官网的说法进行配置。

####Mesos的安装

首先是mesos的安装(这里我们默认spark已经装好啦)。这一部分主要根据这里来完成,我们把中间遇到的一些小问题都记录下来。我们找到spark1.6.0的对应mesos版本是0.21.0,然后就去mesos官网下载,解压:

$ tar -zxf mesos-0.28.1.tar.gz

解压好之后不要忙着编译,先要准备好一些包,Mesos官网上都有讲,我们是CentOS 7.1,按照以下步骤做:

# Install a few utility tools
$ sudo yum install -y tar wget git

# Fetch the Apache Maven repo file.
$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo

# Install the EPEL repo so that we can pull in 'libserf-1' as part of our
# subversion install below.
$ sudo yum install -y epel-release

# 'Mesos > 0.21.0' requires 'subversion > 1.8' devel package,
# which is not available in the default repositories.
# Create a WANdisco SVN repo file to install the correct version:
$ sudo cat > /etc/yum.repos.d/wandisco-svn.repo <<EOF
[WANdiscoSVN]
name=WANdisco SVN Repo 1.9
enabled=1
baseurl=http://opensource.wandisco.com/centos/7/svn-1.9/RPMS/$basearch/
gpgcheck=1
gpgkey=http://opensource.wandisco.com/RPM-GPG-KEY-WANdisco
EOF

# Parts of Mesos require systemd in order to operate. However, Mesos
# only supports versions of systemd that contain the 'Delegate' flag.
# This flag was first introduced in 'systemd version 218', which is
# lower than the default version installed by centos. Luckily, centos
# 7.1 has a patched 'systemd < 218' that contains the 'Delegate' flag.
# Explicity update systemd to this patched version.
$ sudo yum update systemd

# Install essential development tools.
$ sudo yum groupinstall -y "Development Tools"

# Install other Mesos dependencies.
$ sudo yum install -y apache-maven python-devel java-1.8.0-openjdk-devel zlib-devel libcurl-devel openssl-devel cyrus-sasl-devel cyrus-sasl-md5 apr-devel subversion-devel apr-util-devel

在做最后一步的时候遇到了一个错误

Transaction check error:
file /usr/lib64/libsvn_client-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_delta-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_diff-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_fs-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_fs_base-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_fs_fs-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_fs_util-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_ra-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_ra_local-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_ra_svn-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_repos-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_subr-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64
file /usr/lib64/libsvn_wc-1.so.0.0.0 from install of subversion-1.9.4-1.x86_64 conflicts with file from package subversion-libs-1.7.14-6.el7.x86_64

大概意思是新版本和旧版本冲突啦。我们看mesos官网说'Mesos > 0.21.0' requires 'subversion > 1.8' devel package,所以明显旧的1.7是不能用的,我们把它删掉,再重新执行最后一句:

$ sudo yum remove subversion-libs-1.7.14-6.el7.x86_64

然后就是编译啦:

# Change working directory.
$ cd mesos

# Bootstrap (Only required if building from git repository).
$ ./bootstrap

# Configure and build.
$ mkdir build
$ cd build
$ ../configure
$ make
$ make install

正当carolz准备开心地make的时候,又报错了(一个大写的心塞):

cd .. && /bin/sh /home/master/mesos-0.21.0/missing automake-1.14 --foreign
/home/master/mesos-0.21.0/missing: line 81: automake-1.14: command not found
WARNING: 'automake-1.14' is missing on your system.
        You should only need it if you modified 'Makefile.am' or
        'configure.ac' or m4 files included by 'configure.ac'.
        The 'automake' program is part of the GNU Automake package:
        <http://www.gnu.org/software/automake>
        It also requires GNU Autoconf, GNU m4 and Perl in order to run:
        <http://www.gnu.org/software/autoconf>
        <http://www.gnu.org/software/m4/>
        <http://www.perl.org/>
make: *** [../Makefile.in] Error 1

看了一下自己的automake是1.13的,好吧,卸了它!1.14装起来~

yum remove automake
wget ftp://ftp.gnu.org/gnu/automake/automake-1.14.tar.gz
tar xvf automake-1.14.tar.gz
cd automake-1.14
../configure --prefix=/usr --docdir=/usr/share/doc/automake-1.14
make
make install

装好之后再按照上面的步骤重新装mesos。

在集群上的每一个节点都安装好mesos之后,就可以按照下面做一个测试,看看是不是搭好啦:

# Change into build directory.
$ cd build

# Start mesos master (Ensure work directory exists and has proper permissions).
$ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos

# Start mesos slave.
$ ./bin/mesos-slave.sh --master=127.0.0.1:5050

# Visit the mesos web page.
$ http://127.0.0.1:5050

# Run C++ framework (Exits after successfully running some tasks.).
$ ./src/test-framework --master=127.0.0.1:5050

# Run Java framework (Exits after successfully running some tasks.).
$ ./src/examples/java/test-framework 127.0.0.1:5050

# Run Python framework (Exits after successfully running some tasks.).
$ ./src/examples/python/test-framework 127.0.0.1:5050

别忘了把--ip换成你master的ip地址,把--work_dir换成你mesos的安装路径。

然后我们就能在http://master_ip:5050看到整个集群里的资源情况了。到此为止,mesos的部分就结束了,接下来我们要把spark和mesos关联起来。

关于如何方便地用集群启动mesos可以参考这里,我们就不细说了。

####Running Spark on Mesos (client mode)

这一部分主要参考这里

首先我们要把mesos的动态链接库的位置告诉spark(这个动态链接库的具体位置需要自己find一下,find -name大法好),在spark目录下的conf/spark_env.sh有这个设置:

MESOS_NATIVE_JAVA_LIBRARY=/home/master/mesos-0.21.0/build/src/.libs/libmesos.so

然后正常启动mesos,启动spark,把spark运行时候的spark://master_ip:port改成mesos://master_ip:5050就行啦。运行一个application,打开你的浏览器,输入master_ip:5050欣赏劳动成果吧~

####Running Spark on Mesos (cluster mode)

参考文献跟上一个部分是同一篇啦。需要注意的是在编译spark的时候需要

./make_distributed.sh --tgz

生成一个binary package。上传到hdfs上。配置文件要在这之前写好,不然的话executor读不到。(因为executor是从hdfs上下载的啊!)不然可能会出现命名什么都设置好了却JAVA_HOME is not set的尴尬情况。

具体的配置文件参考这里,你可以能需要根据你的操作系统和选择的master对IP地址做一些修改。

####关于Mesos的sandbox

Mesos在运行的时候,可以从WebUI上看到每个节点的stdout和stderr,这个文件在哪里呢?从这里可以找到答案。

####Reference

这次的参考文献非常少,就那么两个官网,鞠躬谢幕~

[1] http://mesos.apache.org/gettingstarted/
[2] https://spark.apache.org/docs/latest/running-on-mesos.html
[3] http://www.litrin.net/2015/08/20/mesos%E5%AE%9E%E6%88%98/