简介
Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job、Hive查询等等。
如果使用的是CDH平台,那么默认就已集成Hue服务了,但是HDP平台却是没有的。每次跑点小任务,查询什么的总是登陆服务器,那样极不方便,所以就干脆装上个Hue啦
环境
- CentOS 7
- HDP 2.5
- Hue 3.12.0
安装Hue
从官网下载3.12.0版本: 不过国内好像挺难下的,正好我这里也提前下好了,已经上传到百度云:
将其放到服务器指定目录上,进行解压。 root@dell:/data/hue# tar -zxvf hue-3.12.0.tgz
安装依赖:
root@dell:~# yum install ant gcc gcc-c++ mysql-devel openssl-devel cyrus-sasl-devel cyrus-sasl cyrus-sasl-gssapi sqlite-devel openldap-devel libacl-devel libxml2-devel libxslt-devel mvn krb5-devel python-devel python-simplejson python-setuptools
编译安装hue:
root@dell:/data/hue# PREFIX=/usr/share make install
PREFIX指定安装路径,这个最好放在空间较大的分区。
安装的过程比较简单,主要是注意要提前配置好Maven,在编译的过程中需要很多的Jar包,会通过Maven进行下载。
配置Hue
编辑Hue安装路径中的desktop/conf/hue.ini
文件
配置数据库
默认hue使用的是sqlite数据库,可以改为mysql 打开hue.ini文件,找到以下内容:
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
# Note for MariaDB use the 'mysql' engine.
## engine=sqlite3 // 改为mysql
## host= // mysql服务器主机名或者ip
## port= // 3306
## user= // 数据库连接用户名(推荐新建一个hue用户)
## password= // 连接用户密码
# Execute this script to produce the database password. This will be used when 'password' is not set.
## password_script=/path/script
## name=desktop/desktop.db // 这里要改为数据库的名字,如:hue
## options={}
# Database schema, to be used only when public schema is revoked in postgres
## schema=
将以上内容修改后,如下:
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
# Note for MariaDB use the 'mysql' engine.
engine=mysql // 改为mysql
host=192.168.1.2 // mysql服务器主机名或者ip
port=3306 // 3306
user=hue // 数据库连接用户名(推荐新建一个hue用户)
password=lu123456 // 连接用户密码
# Execute this script to produce the database password. This will be used when 'password' is not set.
## password_script=/path/script
name=hue // 这里要改为数据库的名字,如:hue
## options={}
# Database schema, to be used only when public schema is revoked in postgres
## schema=
配置完数据库后,还需要同步和迁移数据到我们指定的mysql数据库中。
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue syncdb
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue migrate
同步的过程中可能会提示创建admin用户,后面启动hue的登陆需要用到。
Hadoop配置
编辑hdfs-site.xml
,添加以下属性:
<property>
<name>dfs.webhdfs.enable</name>
<value>true</value>
</property>
HDP默认应该是开启的,HDFS——>Configs——>Advanced——>General——>WebHDFS enabled
添加hue角色代理: HDFS——>Configs——>Advanced——>Custom core-site 添加属性: hadoop.proxyuser.hue.groups=* hadoop.proxyuser.hue.hosts=* 如果不添加这个的话,是无法通过hue提交job的。
Hue 配置,编辑hue.ini
,找到如下内容:
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://e5:8020 // 配置NameNode节点
# fs_defaultfs=hdfs://localhost:8020
# NameNode logical name.
## logical_name=
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://e5:50070/webhdfs/v1 // 配置webhdfs地址
# Change this if your HDFS cluster is Kerberos-secured
security_enabled=false
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
# Directory of the Hadoop configuration
## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
hadoop_config_dir=/etc/hadoop/conf // hadoop配置路径
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=e5
// YARN——>Configs——>Advanced——>Advanced yarn-site
// 查看yarn.resourcemanager.address属性
# The port where the ResourceManager IPC listens on
resourcemanager_port=8050
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://e5:8088
# URL of the ProxyServer API
proxy_api_url=http://e5:8088
# URL of the HistoryServer API
## history_server_api_url=http://localhost:19888
# URL of the Spark History Server
## spark_history_server_url=http://localhost:18088
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
# HA support by specifying multiple clusters.
# Redefine different properties there.
# e.g.
# [[[ha]]] // HA集群高可用配置
# Resource Manager logical name (required for HA)
## logical_name=my-rm-name
# Un-comment to enable
## submit_to=True
# URL of the ResourceManager API
## resourcemanager_api_url=http://localhost:8088
# ...
# Configuration for MapReduce (MR1)
# ------------------------------------------------------------------------
[[mapred_clusters]]
[[[default]]]
# Enter the host on which you are running the Hadoop JobTracker
jobtracker_host=e5
# The port where the JobTracker IPC listens on
jobtracker_port=8050
# JobTracker logical name for HA
## logical_name=
# Thrift plug-in port for the JobTracker
## thrift_port=9290
# Whether to submit jobs to this cluster
submit_to=False
# Change this if your MapReduce cluster is Kerberos-secured
security_enabled=false
这里只是贴一下hadoop的配置,其他的服务如:Oozie、Sqoop等等也很简单如果需要用到的话还是需要进行相应配置的。
启动hue服务
启动hue服务,执行以下命令以开发调试模式启动:
# 0.0.0.0表示允许任何主机连接,如果不加这个的话,默认只运行127.0.0.1本机访问
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue runserver 0.0.0.0:8888
初次启动成功后,在浏览器中打开:http://server-ip:8888
就会跳转到hue的登陆界面,如果没有设置初始账号密码的话,默认就是admin/admin,如果通过上面的同步数据库,创建了admin用户的话,就使用那个用户名和密码登陆即可。
登陆也成功了,我们就可以将hue添加进系统的服务中,方面通过systemd来控制,这样启动、关闭、开机自启什么的也都很容易了。
我这里是基于CDH提供的hue脚本,然后稍微改动一下就可以拿过来用了。
/etc/init.d/hue
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# /etc/rc.d/init.d/hue
#
# Hue web server
#
# chkconfig: 2345 90 10
# description: Hue web server
# pidfile: /var/run/hue/supervisor.pid
. /etc/init.d/functions
LOCKFILE=/var/lock/subsys/hue
#DAEMON=/usr/lib/hue/build/env/bin/supervisor # Introduce the server's location here
DAEMON=/data/hue/hue-3.12.0/build/env/bin/supervisor # Introduce the server's location here
LOGDIR=/var/log/hue # Log directory to use
PIDFILE=/var/run/hue/supervisor.pid
USER=hue
#EXEC=/usr/lib/hue/build/env/bin/python
EXEC=/data/hue/hue-3.12.0/build/env/bin/python
DAEMON_OPTS="-p $PIDFILE -l $LOGDIR -d"
HUE_SHUTDOWN_TIMEOUT=15
hue_start() {
export PYTHON_EGG_CACHE='/tmp/.hue-python-eggs'
#RE_REGISTER=/usr/lib/hue/.re_register
RE_REGISTER=/data/hue/hue-3.12.0/app.reg
if [ -e $RE_REGISTER ]; then
# Do app_reg on upgraded apps. This is a workaround for DISTRO-11.
# We can probably take it out after another release.
DO="/sbin/runuser -s /bin/bash $USER -c"
#APP_REG="/usr/lib/hue/tools/app_reg/app_reg.py"
APP_REG="/data/hue/hue-3.12.0/tools/app_reg/app_reg.py"
# Upgraded apps write their paths in the re_rgister file.
RE_REG_LOG=/var/log/hue/hue_re_register.log
# Make cwd somewhere that $USER can chdir into
pushd / > /dev/null
$DO "DESKTOP_LOG_DIR=$LOGDIR $EXEC $APP_REG --install $(cat $RE_REGISTER | xargs echo -n) >> $RE_REG_LOG 2>&1"
ok=$?
popd > /dev/null
if [ $ok -eq 0 ] ; then
rm -f $RE_REGISTER
else
echo "Failed to register some apps: Details in $RE_REG_LOG"
fi
fi
echo -n "Starting hue: "
for dir in $(dirname $PIDFILE) $LOGDIR ${PYTHON_EGG_CACHE}
do
mkdir -p $dir
chown -R $USER $dir
done
# Check if already running
if [ -e $PIDFILE ] && checkpid $(cat $PIDFILE) ; then
echo "already running"
return 0
fi
# the supervisor itself will setuid down to $USER
su -s /bin/bash $USER -c "$DAEMON $DAEMON_OPTS"
ret=$?
base=$(basename $0)
if [ $ret -eq 0 ]; then
sleep 5
test -e $PIDFILE && checkpid $(cat $PIDFILE)
ret=$?
fi
if [ $ret -eq 0 ]; then
touch $LOCKFILE
success $"$base startup"
else
failure $"$base startup"
fi
echo
return $ret
}
hue_stop() {
if [ ! -e $PIDFILE ]; then
success "Hue is not running"
return 0
fi
echo -n "Shutting down hue: "
HUE_PID=`cat $PIDFILE 2>/dev/null`
if [ -n "$HUE_PID" ]; then
kill -TERM ${HUE_PID} &>/dev/null
for i in `seq 1 ${HUE_SHUTDOWN_TIMEOUT}` ; do
kill -0 ${HUE_PID} &>/dev/null || break
sleep 1
done
kill -KILL ${HUE_PID} &>/dev/null
fi
echo
rm -f $LOCKFILE $PIDFILE
return 0
}
hue_restart() {
hue_stop
hue_start
}
case "$1" in
start)
hue_start
;;
stop)
hue_stop
;;
status)
status -p $PIDFILE supervisor
;;
restart|reload)
hue_restart
;;
condrestart)
[ -f $LOCKFILE ] && restart || :
;;
*)
echo "Usage: hue {start|stop|status|reload|restart|condrestart"
exit 1
;;
esac
exit $?
主要改动以下几个变量的值即可:
DAEMON
EXEC
RE_REGISTER
APP_REG
注意需要将上面的脚本放到/etc/init.d/
目录下面。 之后我们就可以通过systemctl
命令进行服务的控制,如:
# 启动服务
# systemctl start hue
# 停止服务
# systemctl stop hue
# 开机自动启动服务
# systemctl enable hue
总结
hue的安装配置还是比较容易的,但是初次接触难免总是会出现各种各样的问题,如:python版本、数据库配置、大数据组件服务配置等等问题。遇到问题也不用慌,先看错误提示,然后再看错误日志,最后再百度google搜索,或者查阅官方文档,办法总是会有的。最后建议如果公司服务器配置比较好的话,还是上CDH吧,服务更健全,还有商业保证就算遇到了难以解决的问题,也可以提给Cloudera。。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/2020.html