准备环境
名称 | 版本 | 地址 | 备注 |
Centos | 7.9 | https://mirrors.aliyun.com/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-2009.iso | |
Java | 1.8 | 直接使用yum命令安装java1.8 | |
Spark | 3.2.1 | https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz | 需要包括使用包含hadoop的下载包,否则不包含连接hive的jar包 |
Hadoop | 3.3.4 | https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz | |
Hive | 3.1.3 | https://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz | |
MySQL | 8.0.28 | https://repo.huaweicloud.com/mysql/Downloads/MySQL-8.0/mysql-8.0.28-el7-x86_64.tar.gz | |
MySQL JDBC驱动 | 8.0.28 | https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.28.tar.gz | |
Linkis | 1.6.0 | 项目安装包 管理台安装包 | 可以使用编译好的安装包,但是如果环境不一样最好自己编译一下,编译可以看linkis编译章节 |
安装基础环境
安装Java
首先修改yum原始源为阿里云源
//备份原始yum配置文件
sudo cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
//下载阿里云源配置文件
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
//清除yum缓存并更新
sudo yum clean all
sudo yum makecache
然后下载java1.8
sudo yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
验证是否安装成功
[root@localhost opt]# java -version
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode)
配置java环境
vim /etc/profile.d/java.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=$JAVA_HOME/bin:$PATH
应用环境变量source /etc/profile
配置tmux(可选)
tmux 是一个终端复用器(terminal multiplexer),它可以在一个终端窗口中创建、管理和切换多个独立的会话窗口和面板,同时支持会话持久化。即使断开了终端会话,tmux 中运行的任务也不会停止,用户可以重新连接,继续工作。
下载tmux
yum -y install tmux
//配置tmux可以滚动
vim ~/.tmux.conf
//添加一行
setw -g mode-mouse on
//应用配置
tmux source-file ~/.tmux.conf
创建Hadoop用户
sudo groupadd hadoop
sudo useradd -g hadoop hadoop
配置sudo权限
//添加/etc/sudoers写权限
chmod +w /etc/sudoers
vim /etc/sudoers
//添加一句
hadoop ALL=(ALL:ALL) ALL
//退出文件后修改/etc/sudoers权限
chmod -w /etc/sudoers
验证本机hadoop环境
[hadoop@localhost opt]$ hdfs dfs -ls /
Found 1 items
drwx-wx-wx - root supergroup 0 2025-01-13 20:59 /tmp
配置localhost免密登录
//创建密钥
ssh-keygen
//一路按Enter就行
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
验证localhost是否免密登录
ssh localhost
安装MySQL
安装MySQL依赖
sudo yum update
sudo yum install ncurses-devel cmake gcc gcc-c++ make
创建MySQL组及用户
sudo groupadd mysql
sudo useradd -g mysql -r mysql
解压MySQL并授予MySQL用户权限
tar zxvf mysql-8.0.28-el7-x86_64.tar.gz
mv mysql-8.0.28-el7-x86_64 mysql
chown -R mysql:mysql mysql
添加mysql环境变量
cat > /etc/profile.d/mysql.sh << EOF
export MYSQL_HOME=/opt/mysql
export PATH=$MYSQL_HOME/bin:$PATH
EOF
添加mysql配置,vim /etc/my.cnf
[mysqld]
sql_mode=STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION
datadir=/opt/mysql/data
socket=/tmp/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
[mysqld_safe]
log-error=/home/mysql/mariadb/mariadb.log
pid-file=/home/mysql/mariadb/mariadb.pid
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
添加systemctl服务
vim /opt/mysql/support-files/mysql.server
//修改两行
basedir=/opt/mysql
datadir=/opt/mysql/data
将mysql.server复制到/etc/rc.d/init.d文件夹中
cp /opt/mysql/support-files/mysql.server /etc/rc.d/init.d/mysql
systemctl daemon-reload
初始化mysql
su mysql
bash-4.2$ bin/mysqld --initialize
2025-01-13T15:03:21.406627Z 0 [Warning] [MY-010139] [Server] Changed limits: max_open_files: 1024 (requested 8161)
2025-01-13T15:03:21.406664Z 0 [Warning] [MY-010142] [Server] Changed limits: table_open_cache: 431 (requested 4000)
2025-01-13T15:03:21.407768Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release.
2025-01-13T15:03:21.407841Z 0 [Warning] [MY-010915] [Server] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2025-01-13T15:03:21.408045Z 0 [System] [MY-013169] [Server] /opt/mysql/bin/mysqld (mysqld 8.0.28) initializing of server in progress as process 18090
2025-01-13T15:03:21.462971Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2025-01-13T15:03:26.650301Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2025-01-13T15:03:39.990048Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: sR9.Ffue:El-
记住上面的初始密码,使用systemctl restart mysql启动mysql
systemctl restart mysql
mysql -u root -p
//使用上面的初始密码登录
修改mysql用户密码,及授予mysql远程访问
ALTER USER 'root'@'localhost' IDENTIFIED BY 'your_new_password';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
CREATE USER 'root'@'%' IDENTIFIED BY 'your_password';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
安装Hadoop
首先解压,在/opt环境中解压hadoop压缩包
cd /opt
tar zxvf hadoop-3.3.4.tar.gz
mv hadoop-3.3.4 hadoop
sudo chown -R hadoop:hadoop /opt/hadoop
//切换成hadoop用户执行以下操作
su hadoop
添加hadoop用户免密登录
ssh-keygen
ssh-copy-id 192.168.122.24//改成本机ip地址
添加hadoop环境变量
sudo cat > /etc/profile.d/hadoop.sh << EOF
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
EOF
应用环境变量sudo source /etc/profile
设置hadoop运行环境
cat > /opt/hadoop/etc/hadoop/hadoop-env.sh <<EOF
export HDFS_NAMENODE_USER=hadoop
export HDFS_DATANODE_USER=hadoop
export HDFS_SECONDARYNAMENODE_USER=hadoop
export YARN_RESOURCEMANAGER_USER=hadoop
export YARN_NODEMANAGER_USER=hadoop
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
EOF
进入hadoop进行配置
cd /opt/hadoop/etc/hadoop
vim core-site.xml
在<configuration></configuration>中添加
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.122.24:9000</value><!--改为自己ip-->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
修改hdfs-site.xml
在<configuration></configuration>中添加
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/data/datanode</value>
</property>
修改yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--改成本机ip-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.122.24</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>24576</value> <!-- 24GB -->
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>6144</value> <!-- 单个容器最大分配内存,例如 4GB -->
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value> <!-- 单个容器最小分配内存,例如 1GB -->
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value> <!-- 设置单个 NodeManager 最大可用 16 个 CPU 核心 -->
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>8</value> <!-- 单容器最多分配 8 核 -->
</property>
mapred-site.xml可改可不改,DSS不使用mapreduce
配置工作节点
添加workers文件,没有则新建
cat > workers << EOF
localhost
EOF
初始化hdfs,格式化namenode
hdfs namenode -format
cd /opt/hadoop
sbin/start-dfs.sh
启动yarn
sbin/start-yarn.sh
确认hadoop所有服务开启
[hadoop@localhost hadoop]$ jps
14948 DataNode
16231 Jps
15224 SecondaryNameNode
14794 NameNode
15755 NodeManager
15612 ResourceManager
安装Hive
解压hive并添加环境变量
cd /opt
tar zxvf apache-hive-3.1.3-bin.tar.gz
mv apache-hive-3.1.3-bin hive
//创建hive.sh
cat > /etc/profile.d/hive.sh << EOF
export HIVE_HOME=/opt/hive
export PATH=$HIVE_HOME/bin:$PATH
EOF
//应用环境变量
source /etc/profile
修改hive文件夹权限为hadoop
chown -R hadoop:hadoop /opt/hive
配置hive
cd /opt/hive/conf
添加hive配置
cat > hive-site.xml << EOF
<configuration>
<!--配置为本机的mysql地址-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.122.24:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value>
</property>
<!--连接mysql驱动-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.druid.metadata.db.type</name>
<value>mysql</value>
</property>
<property>
<name>hive.druid.metadata.uri</name>
<value>jdbc:mysql://192.168.122.24:3306</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.122.24:9083</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>FALSE</value>
<description>
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
</description>
</property>
<property>
<name>hive.metastore.schema.verification.record.version</name>
<value>false</value>
</property>
<!--取消验证-->
<property>
<name>hive.security.authorization.enabled</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.port</name>
<value>9083</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
<property>
<name>hive.metastore.transactional.event.listeners</name>
<value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.impersonation</name>
<value>false</value>
</property>
<!-- 开启事务支持 -->
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<!-- 事务管理器 -->
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<!-- 启用动态分区 -->
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<!-- 事务表默认存储格式为ORC -->
<property>
<name>hive.default.fileformat</name>
<value>ORC</value>
</property>
</configuration>
EOF
配置hive运行日志
cat >> conf/hive-log4j2.properties << EOF
rootLogger.level = INFO
rootLogger.appenderRef.stdout.ref = console
rootLogger.appenderRef.rolling.ref = file
appender.console.name = console
appender.console.type = Console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %-5p [%t] %c: %m%n
appender.rolling.name = file
appender.rolling.type = RollingFile
appender.rolling.fileName = /opt/hive/hiveserver2.log
appender.rolling.filePattern = /opt/hive/hiveserver2-%d{MM-dd-yyyy}.log.gz
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = %d{ISO8601} %-5p [%t] %c: %m%n
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
logger.query.level = INFO
logger.query.name = org.apache.hive
EOF
添加mysql用户
//进入mysql
mysql -u root -p
//创建hive数据库
CREATE DATABASE hive;
//创建hive用户
CREATE USER 'hive'@'%' IDENTIFIED BY 'your_password;
//授予权限
GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';
//刷新权限
flush privileges;
将MySQL JDBC驱动加入hive的lib包中
cp /opt/mysql-connector-java-8.0.28.jar /opt/hive/lib/
初始化Hive MetaStore数据库
schematool -dbType mysql -initSchema
启动Hive metastore
//启动一个tmux终端后台运行metastore
tmux new -s metastore
hive --service metastore
启动Hiveserver2
//启动一个tmux终端后台运行hiveserver2
hive --service hiveserver2
验证本机hive环境
[hadoop@localhost opt]$ hive -e "show databases"
which: no hbase in (/bin:/sbin:/root/.nvm/versions/node/v12.22.12/bin:/usr/local/maven/bin:/opt/mysql/bin:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/bin:/opt/hive/bin:/opt/hadoop/bin:/opt/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2025-01-13T21:43:36,908 INFO [main] org.apache.hadoop.hive.conf.HiveConf: Found configuration file file:/opt/hive/conf/hive-site.xml
2025-01-13T21:43:37,273 WARN [main] org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
2025-01-13T21:43:37,482 WARN [main] org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
Hive Session ID = 8ba4dddf-ee09-49e2-b10b-370d4981aac6
2025-01-13T21:43:38,071 INFO [main] SessionState: Hive Session ID = 8ba4dddf-ee09-49e2-b10b-370d4981aac6
Logging initialized using configuration in file:/opt/hive/conf/hive-log4j2.properties Async: true
2025-01-13T21:43:38,169 INFO [main] SessionState:
Logging initialized using configuration in file:/opt/hive/conf/hive-log4j2.properties Async: true
2025-01-13T21:43:39,601 INFO [main] org.apache.hadoop.hive.ql.session.SessionState: Created HDFS directory: /tmp/hive/hadoop/8ba4dddf-ee09-49e2-b10b-370d4981aac6
2025-01-13T21:43:39,663 INFO [main] org.apache.hadoop.hive.ql.session.SessionState: Created local directory: /tmp/hadoop/8ba4dddf-ee09-49e2-b10b-370d4981aac6
2025-01-13T21:43:39,705 INFO [main] org.apache.hadoop.hive.ql.session.SessionState: Created HDFS directory: /tmp/hive/hadoop/8ba4dddf-ee09-49e2-b10b-370d4981aac6/_tmp_space.db
2025-01-13T21:43:39,749 INFO [main] org.apache.hadoop.hive.conf.HiveConf: Using the default value passed in for log id: 8ba4dddf-ee09-49e2-b10b-370d4981aac6
2025-01-13T21:43:39,750 INFO [main] org.apache.hadoop.hive.ql.session.SessionState: Updating thread name to 8ba4dddf-ee09-49e2-b10b-370d4981aac6 main
2025-01-13T21:43:39,859 WARN [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
2025-01-13T21:43:40,956 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://192.168.122.24:9083
2025-01-13T21:43:41,017 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
2025-01-13T21:43:41,072 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient: Connected to metastore.
2025-01-13T21:43:41,072 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hadoop (auth:SIMPLE) retries=1 delay=1 lifetime=0
2025-01-13T21:43:41,302 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.conf.HiveConf: Using the default value passed in for log id: 8ba4dddf-ee09-49e2-b10b-370d4981aac6
Hive Session ID = 9acbeb8d-f253-4ff9-87a2-49b53e9f4cf7
2025-01-13T21:43:41,304 INFO [pool-7-thread-1] SessionState: Hive Session ID = 9acbeb8d-f253-4ff9-87a2-49b53e9f4cf7
2025-01-13T21:43:41,371 INFO [pool-7-thread-1] org.apache.hadoop.hive.ql.session.SessionState: Created HDFS directory: /tmp/hive/hadoop/9acbeb8d-f253-4ff9-87a2-49b53e9f4cf7
2025-01-13T21:43:41,377 INFO [pool-7-thread-1] org.apache.hadoop.hive.ql.session.SessionState: Created local directory: /tmp/hadoop/9acbeb8d-f253-4ff9-87a2-49b53e9f4cf7
2025-01-13T21:43:41,397 INFO [pool-7-thread-1] org.apache.hadoop.hive.ql.session.SessionState: Created HDFS directory: /tmp/hive/hadoop/9acbeb8d-f253-4ff9-87a2-49b53e9f4cf7/_tmp_space.db
2025-01-13T21:43:41,441 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: Compiling command(queryId=hadoop_20250113214341_50e76e03-0c40-42e3-900b-55ea4e9d5d68): show databases
2025-01-13T21:43:41,503 INFO [pool-7-thread-1] org.apache.hadoop.hive.ql.metadata.HiveMaterializedViewsRegistry: Materialized views registry has been initialized
2025-01-13T21:43:42,628 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: Semantic Analysis Completed (retrial = false)
2025-01-13T21:43:42,678 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
2025-01-13T21:43:42,772 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.exec.ListSinkOperator: Initializing operator LIST_SINK[0]
2025-01-13T21:43:42,783 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: Completed compiling command(queryId=hadoop_20250113214341_50e76e03-0c40-42e3-900b-55ea4e9d5d68); Time taken: 1.389 seconds
2025-01-13T21:43:42,783 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.reexec.ReExecDriver: Execution #1 of query
2025-01-13T21:43:42,784 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: Executing command(queryId=hadoop_20250113214341_50e76e03-0c40-42e3-900b-55ea4e9d5d68): show databases
2025-01-13T21:43:42,796 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: Starting task [Stage-0:DDL] in serial mode
2025-01-13T21:43:42,808 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] hive.ql.exec.DDLTask: results : 1
OK
2025-01-13T21:43:42,848 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: Completed executing command(queryId=hadoop_20250113214341_50e76e03-0c40-42e3-900b-55ea4e9d5d68); Time taken: 0.064 seconds
2025-01-13T21:43:42,849 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.Driver: OK
2025-01-13T21:43:42,876 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.conf.Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2025-01-13T21:43:42,949 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.mapred.FileInputFormat: Total input files to process : 1
default
2025-01-13T21:43:42,990 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.exec.ListSinkOperator: RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_LIST_SINK_0:1,
Time taken: 1.459 seconds, Fetched: 1 row(s)
2025-01-13T21:43:42,999 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.conf.HiveConf: Using the default value passed in for log id: 8ba4dddf-ee09-49e2-b10b-370d4981aac6
2025-01-13T21:43:42,998 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] CliDriver: Time taken: 1.459 seconds, Fetched: 1 row(s)
2025-01-13T21:43:42,999 INFO [8ba4dddf-ee09-49e2-b10b-370d4981aac6 main] org.apache.hadoop.hive.ql.session.SessionState: Resetting thread name to main
2025-01-13T21:43:42,999 INFO [main] org.apache.hadoop.hive.conf.HiveConf: Using the default value passed in for log id: 8ba4dddf-ee09-49e2-b10b-370d4981aac6
2025-01-13T21:43:43,044 INFO [main] org.apache.hadoop.hive.ql.session.SessionState: Deleted directory: /tmp/hive/hadoop/8ba4dddf-ee09-49e2-b10b-370d4981aac6 on fs with scheme hdfs
2025-01-13T21:43:43,045 INFO [main] org.apache.hadoop.hive.ql.session.SessionState: Deleted directory: /tmp/hadoop/8ba4dddf-ee09-49e2-b10b-370d4981aac6 on fs with scheme file
2025-01-13T21:43:43,052 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 0
安装Spark
解压spark压缩包并设置环境变量
cd /opt
tar zxvf spark-3.2.1-bin-hadoop3.2.tgz
mv spark-3.2.1-bin-hadoop3.2 spark
chown -R hadoop:hadoop spark
cat > /etc/profile.d/spark.sh << EOF
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
EOF
source /etc/profile
su hadoop
配置spark-env.sh
cat >> $SPARK_HOME/conf/spark-env.sh << EOF
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export SPARK_HOME=/opt/spark
export SPARK_CONF_DIR=/opt/spark/conf
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
EOF
配置spark-defaults.conf
cat >> $SPARK_HOME/conf/spark-defaults.conf << EOF
spark.master yarn
spark.driver.memory 2g
spark.executor.memory 3g
spark.eventLog.enabled true
spark.eventLog.dir hdfs:///spark-logs
spark.yarn.historyServer.address 192.168.122.24:18080
spark.history.fs.logDirectory hdfs:///spark-logs
spark.driver.extraClassPath /opt/hadoop/share/hadoop/common/lib/*
spark.executor.extraClassPath /opt/hadoop/share/hadoop/common/lib/*
spark.executorEnv.PYSPARK_PYTHON /usr/bin/python
EOF
hdfs创建spark日志目录
hdfs dfs -mkdir -p hdfs:///spark-logs
验证本机spark环境
spark-sql -e "show databases"
*****
*****
2025-01-13 21:53:50,519 INFO hive.metastore: Opened a connection to metastore, current connections: 1
2025-01-13 21:53:50,575 INFO hive.metastore: Connected to metastore.
Spark master: yarn, Application Id: application_1736775772310_0002
2025-01-13 21:53:50,789 INFO thriftserver.SparkSQLCLIDriver: Spark master: yarn, Application Id: application_1736775772310_0002
2025-01-13 21:53:53,869 INFO codegen.CodeGenerator: Code generated in 167.916501 ms
2025-01-13 21:53:53,927 INFO codegen.CodeGenerator: Code generated in 12.675728 ms
default
Time taken: 3.139 seconds, Fetched 1 row(s)
2025-01-13 21:53:53,980 INFO thriftserver.SparkSQLCLIDriver: Time taken: 3.139 seconds, Fetched 1 row(s)
2025-01-13 21:53:54,026 INFO server.AbstractConnector: Stopped Spark@58b0dfee{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2025-01-13 21:53:54,032 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.122.24:4040
2025-01-13 21:53:54,043 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
2025-01-13 21:53:54,098 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
2025-01-13 21:53:54,099 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2025-01-13 21:53:54,114 INFO cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
2025-01-13 21:53:54,229 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2025-01-13 21:53:54,255 INFO memory.MemoryStore: MemoryStore cleared
2025-01-13 21:53:54,256 INFO storage.BlockManager: BlockManager stopped
2025-01-13 21:53:54,282 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2025-01-13 21:53:54,288 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2025-01-13 21:53:54,310 INFO spark.SparkContext: Successfully stopped SparkContext
2025-01-13 21:53:54,310 INFO util.ShutdownHookManager: Shutdown hook called
安装Linkis
创建Linkis目录
su hadoop
sudo mkdir -p /opt/linkis
sudo cp apache-linkis-1.6.0-bin.tar.gz /opt/linkis/
sudo chown -R hadoop:hadoop /opt/linkis
cd /opt/linkis
tar zxvf apache-linkis-1.6.0-bin.tar.gz
新建MySQL的dss库以及dss用户
mysql -u root -p
CREATE USER 'dss'@'%' IDENTIFIED BY '123456';
create database dss;
flush privileges;
创建linkis应用文件夹
sudo mkdir -p /appcom/Install
sudo mkdir -p /appcom/config
sudo mkdir -p /appcom/tmp
sudo chown -R hadoop:hadoop /appcom/
创建Spark及Hive软链接
ln -s /opt/hive /appcom/Install/
ln -s /opt/spark /appcom/Install/
ln -s /opt/hadoop /appcom/Install/
ln -s /opt/hive/conf /appcom/config/hive-config
ln -s /opt/spark/conf /appcom/config/spark-config
ln -s /opt/hadoop/etc/hadoop /appcom/config/hadoop-config
修改Hive配置
deploy-config/linkis-env.sh
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# description: Starts and stops Server
#
# @name: linkis-env
#
# Modified for Linkis 1.0.0
# SSH_PORT=22
### deploy user
deployUser=hadoop
##If you don't set it, a random password string will be generated during installation
deployPwd=
### database type
### choose mysql or postgresql, default mysql
dbType=mysql
export dbType
##Linkis_SERVER_VERSION
LINKIS_SERVER_VERSION=v1
### Specifies the user workspace, which is used to store the user's script files and log files.
### Generally local directory, path mode can be [file://] or [hdfs://]
WORKSPACE_USER_ROOT_PATH=file:///opt/linkis/logs
### User's root hdfs path, path mode can be [file://] or [hdfs://]
HDFS_USER_ROOT_PATH=hdfs:///tmp/linkis
### Path to store started engines and engine logs, must be local
ENGINECONN_ROOT_PATH=/appcom/tmp
###path mode can be [file://] or [hdfs://]
#ENTRANCE_CONFIG_LOG_PATH=hdfs:///tmp/linkis/
### Path to store job ResultSet
### path mode can be [file://] or [hdfs://]
RESULT_SET_ROOT_PATH=hdfs:///tmp/linkis
##YARN REST URL spark engine required
# Active resourcemanager address needed. Recommended to add all ha addresses. eg YARN_RESTFUL_URL="http://127.0.0.1:8088;http://127.0.0.2:8088"
YARN_RESTFUL_URL="http://192.168.122.24:8088"
## request Yarn resource restful interface When Yarn need auth by user
## If your environment yarn interface can be accessed directly, ignore it
#YARN_AUTH_ENABLE=false
#YARN_AUTH_USER=hadoop
#YARN_AUTH_PWD=123456
## request spnego enabled Yarn resource restful interface When Yarn enable kerberos
## If your environment yarn interface can be accessed directly, ignore it
#YARN_KERBEROS_ENABLE=true
#YARN_PRINCIPAL_NAME=yarn
#YARN_KEYTAB_PATH=/etc/security/keytabs/yarn.keytab
#YARN_KRB5_PATH=/etc/krb5.conf
##############################################################
#
# NOTICE:
# You can also set these variables as system environment in ~/.bashrc file
#HADOOP
HADOOP_HOME=${HADOOP_HOME:-"/appcom/Install/hadoop"}
HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/appcom/config/hadoop-config"}
HADOOP_KERBEROS_ENABLE=${HADOOP_KERBEROS_ENABLE:-"false"}
HADOOP_KEYTAB_PATH=${HADOOP_KEYTAB_PATH:-"/appcom/keytab/"}
## Hadoop env version
HADOOP_VERSION=${HADOOP_VERSION:-"3.3.4"}
#Hive
HIVE_HOME=/appcom/Install/hive
HIVE_CONF_DIR=/appcom/config/hive-config
#Spark
SPARK_HOME=/appcom/Install/spark
SPARK_CONF_DIR=/appcom/config/spark-config
## Engine version conf
#SPARK_VERSION
#SPARK_VERSION=3.2.1
##HIVE_VERSION
#HIVE_VERSION=3.1.3
#PYTHON_VERSION=python2
################### The install Configuration of all Micro-Services #####################
#
# NOTICE:
# 1. If you just wanna try, the following micro-service configuration can be set without any settings.
# These services will be installed by default on this machine.
# 2. In order to get the most complete enterprise-level features, we strongly recommend that you install
# Linkis in a distributed manner and set the following microservice parameters
#
### DISCOVERY
DISCOVERY=EUREKA
### EUREKA install information
### You can access it in your browser at the address below:http://${EUREKA_INSTALL_IP}:${EUREKA_PORT}
#EUREKA: Microservices Service Registration Discovery Center
#EUREKA_INSTALL_IP=127.0.0.1
EUREKA_PORT=20303
export EUREKA_PREFER_IP=false
#EUREKA_HEAP_SIZE="512M"
### NACOS install information
### NACOS
NACOS_SERVER_ADDR=127.0.0.1:8848
##linkis-mg-gateway
#GATEWAY_INSTALL_IP=127.0.0.1
GATEWAY_PORT=9001
#GATEWAY_HEAP_SIZE="512M"
##linkis-cg-linkismanager
#MANAGER_INSTALL_IP=127.0.0.1
MANAGER_PORT=9101
#MANAGER_HEAP_SIZE="512M"
##linkis-cg-engineconnmanager
#ENGINECONNMANAGER_INSTALL_IP=127.0.0.1
ENGINECONNMANAGER_PORT=9102
#ENGINECONNMANAGER_HEAP_SIZE="512M"
##linkis-cg-entrance
#ENTRANCE_INSTALL_IP=127.0.0.1
ENTRANCE_PORT=9104
#ENTRANCE_HEAP_SIZE="512M"
##linkis-ps-publicservice
#PUBLICSERVICE_INSTALL_IP=127.0.0.1
PUBLICSERVICE_PORT=9105
#PUBLICSERVICE_HEAP_SIZE="512M"
########################################################################################
## LDAP is for enterprise authorization, if you just want to have a try, ignore it.
#LDAP_URL=ldap://localhost:1389/
#LDAP_BASEDN=dc=apache,dc=com
#LDAP_USER_NAME_FORMAT=cn=%[email protected],OU=xxx,DC=xxx,DC=com
## java application default jvm memory
export SERVER_HEAP_SIZE="512M"
##The decompression directory and the installation directory need to be inconsistent
#LINKIS_HOME=/appcom/Install/LinkisInstall
##The extended lib such mysql-connector-java-*.jar
#LINKIS_EXTENDED_LIB=/appcom/common/linkisExtendedLib
LINKIS_VERSION=1.6.0
# for install
LINKIS_PUBLIC_MODULE=lib/linkis-commons/public-module
## If SKYWALKING_AGENT_PATH is set, the Linkis components will be started with Skywalking agent
#SKYWALKING_AGENT_PATH=/appcom/config/skywalking-agent/skywalking-agent.jar
##If you want to enable prometheus for monitoring linkis, you can set this export PROMETHEUS_ENABLE=true
export PROMETHEUS_ENABLE=false
#If you only want to experience linkis streamlined services, not rely on hdfs
#you can set the following configuration to false and for the configuration related to the file directory,
#use path mode of [file://] to replace [hdfs://]
export ENABLE_HDFS=true
export ENABLE_HIVE=true
export ENABLE_SPARK=true
## define MYSQL_CONNECT_JAVA_PATH&OLK_JDBC_PATH, the linkis can check JDBC driver
MYSQL_CONNECT_JAVA_PATH=
OLK_JDBC_PATH=
deploy-config/db.sh
#改成本机ip
MYSQL_HOST=192.168.122.24
MYSQL_PORT=3306
MYSQL_DB=dss
MYSQL_USER=dss
MYSQL_PASSWORD=123456
HIVE_META_URL="jdbc:mysql://192.168.122.24:3306/hive"
HIVE_META_USER="hive"
HIVE_META_PASSWORD="123456"
取消随机数令牌生成
将bin/install.sh中的DEBUG_MODE设置为true
执行安装脚本
sh bin/install.sh
按提示走,遇到环境检查问题,缺什么按什么,hadoop、spark和hive遇到问题先看脚本内容自己运行一遍,实在不行可以注释掉checkEnv.sh里的spark和hive环境检查,只要前面的hive和spark安装完的验证没问题即可
[暂未完成,dss平台只兼容到1.4.0]安装DSS平台
创建dss安装目录
cd /opt
sudo mkdir dss
sudo cp wedatasphere-dss-1.2.1-dist.tar.gz dss/
sudo chown -R hadoop:hadoop dss/
cd dss
tar zxvf wedatasphere-dss-1.2.1-dist.tar.gz
配置dss安装环境
config/config.sh
### deploy user
deployUser=hadoop
##远程连接port
SSH_PORT=22
## max memory for services
SERVER_HEAP_SIZE=512M
### The install home path of DSS,Must provided
DSS_INSTALL_HOME=/opt/dss/dss_install
DSS_VERSION=1.2.1
DSS_FILE_NAME=dss-1.2.1
DSS_WEB_PORT=8085
### Linkis EUREKA information. # Microservices Service Registration Discovery Center
EUREKA_INSTALL_IP=192.168.122.24
EUREKA_PORT=20303
### If EUREKA has safety verification, please fill in username and password
#EUREKA_USERNAME=
#EUREKA_PASSWORD=
### Linkis Gateway information
GATEWAY_INSTALL_IP=192.168.122.24
GATEWAY_PORT=9001
### Linkis BML Token
BML_AUTH=BML-AUTH
################### The install Configuration of all Micro-Services start #####################
#
# NOTICE:
# 1. If you just wanna try, the following micro-service configuration can be set without any settings.
# These services will be installed by default on this machine.
# 2. In order to get the most complete enterprise-level features, we strongly recommend that you install
# the following microservice parameters
#
### DSS_SERVER
### This service is used to provide dss-server capability.
### dss-server
DSS_SERVER_INSTALL_IP=192.168.122.24
DSS_SERVER_PORT=9043
### dss-apps-server
DSS_APPS_SERVER_INSTALL_IP=192.168.122.24
DSS_APPS_SERVER_PORT=9044
################### The install Configuration of all Micro-Services end #####################
############## ############## dss_appconn_instance configuration start ############## ##############
####eventchecker表的地址,一般就是dss数据库
EVENTCHECKER_JDBC_URL=jdbc:mysql://192.168.122.24:3306/dss?characterEncoding=UTF-8
EVENTCHECKER_JDBC_USERNAME=dss
EVENTCHECKER_JDBC_PASSWORD=123456
#### hive地址
DATACHECKER_JOB_JDBC_URL=jdbc:mysql://192.168.122.24:3306/hive?useUnicode=true&characterEncoding=UTF-8
DATACHECKER_JOB_JDBC_USERNAME=hive
DATACHECKER_JOB_JDBC_PASSWORD=123456
#### 元数据库,可配置成和DATACHECKER_JOB的一致
DATACHECKER_BDP_JDBC_URL=jdbc:mysql://192.168.122.24:3306/hive?useUnicode=true&characterEncoding=UTF-8
DATACHECKER_BDP_JDBC_USERNAME=hive
DATACHECKER_BDP_JDBC_PASSWORD=123456
### 邮件节点配置
EMAIL_HOST=smtp.163.com
EMAIL_PORT=25
[email protected]
EMAIL_PASSWORD=xxxxx
EMAIL_PROTOCOL=smtp
############## ############## dss_appconn_instance configuration end ############## ##############
config/db.sh
### for DSS-Server and Eventchecker APPCONN
MYSQL_HOST=192.168.122.24
MYSQL_PORT=3306
MYSQL_DB=dss
MYSQL_USER=dss
MYSQL_PASSWORD=123456
注意:db/dss_ddl.sql中存在问题,需要将横线注释掉不然会报错
ERROR 1064 (42000) at line 30 in file: '/opt/dss/db/dss_ddl.sql': You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-------------------------------------------------------------------
------------' at line 1
如图注释掉
同样的db/dss_dml.sql中也有问题,需要将–线注释掉,需要注意的是,请改db/dss_dml.sql文件,它在dssinstall.sh脚本启动后会复制到db/dss_dml_real.sql文件
安装dss
sh bin/dssinstall.sh
将上面的问题一步步解决后,会出现sed:无法读取 /opt/dss/dss_install/conf/application-dss.yml
错误,这是因为dss1.2.1使用的是application-dss.properties作为配置文件,因此打开dss_install/conf/application-dss.properties后,手动修改配置
//只需要将127.0.0.1换成公网ip,如果你不需要外界访问eureka注册中心就不需要改
eureka.client.serviceUrl.defaultZone=http://192.168.122.24:20303/eureka/
logging.config=classpath:log4j2.xml
management.endpoints.web.exposure.include=refresh,info
启动dss
cd dss_install/sbin