[AI大数据开发]Sqoop总结和踩坑
[AI大数据开发]
Sqoop总结和踩坑
© 西早boy 中级黑马
/ 2019-10-21 18:27 / 5217 人查看 / 4 人回复 /
0 人收藏
转载请遵从CC协议
禁止商业使用本文
Sqoop安装环境Centos7.6 + Hadoop 3.2.0 + Hive3 + sqoop-1.4.7.bin__hadoop-2.6.0【sqoop-部署】快速入门(安装与使用)Sqoop架构以及应用介绍
Sqoop架构介绍 Sqoop在业务当中的实际地位 一个流程图介绍Hadoop业务的开发流程以及Sqoop在业务当中的实际地位。如上图所示:在实际的业务当中,我们首先对原始数据集通过MapReduce进行数据清洗,然后将清洗后的数据存入到Hbase数据库中,而后通过数据仓库Hive对Hbase中的数据进行统计与分析,分析之后将分析结果存入到Hive表中,然后通过Sqoop这个工具将我们的数据挖掘结果导入到MySql数据库中,最后通过Web将结果展示给客户。Sqoop的概念 Sqoop:SQL–to–Hadoop
正如Sqoop的名字所示:Sqoop是一个用来将关系型数据库和Hadoop中的数据进行相互转移的工具,
可以将一个关系型数据库(例如Mysql、Oracle)中的数据导入到Hadoop(例如HDFS、Hive、Hbase)中,
也可以将Hadoop(例如HDFS、Hive、Hbase)中的数据导入到关系型数据库(例如Mysql、Oracle)中。如下图所示:[size=0.9em]Sqoop收到客户端的shell命令或者Java api命令后,通过Sqoop中的任务翻译器 (Task Translator)
将命令转换为对应的MapReduce任务,而后将关系型数据库和Hadoop中的数据进行相互转移,进而完成数
据的拷贝。 Sqoop注意把HDFS、Hive、HBase中的数据导出到MySQL、Oracle等数据库中。
1.4 为sqoop1, 1.9 为sqoop2 ,sqoop1与sqoop2是不兼容的。
下载安装sqoop download说明:sqoop对应hadoop的版本要求不大,hadoop2.x与hadoop3.x都可以使用wget http://mirror.bit.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.tar.gz
tar -zxvf sqoop-1.4.7.tar.gz
mv sqoop-1.4.7.tar.gz sqoop修改配置cd sqoop/conf/
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
#添加
export HADOOP_COMMON_HOME=/server/hadoop
export HADOOP_MAPRED_HOME=/server/hadoop
export HIVE_HOME=/server/hive
export ZOOCFGDIR=/server/hive/hcatalog
vi /opt/apps/sqoop/bin/configure-sqoop
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
# Add HCatalog to dependency list
#if [ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`
# if [ -z "${HIVE_CONF_DIR}" ]; then
# TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
# fi
# SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi
# Add Accumulo to dependency list
#if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
# for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
# for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
#fi添加驱动驱动下载地址将驱动包添加到lib/目录下#没有mysql驱动,安装sql驱动
wget -P /opt/apps/sqoop/lib/ http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.40/mysql-connector-java-5.1.40.jar
#建议hadoop集群的mysql驱动统一,所以这里使用的是之前安装的hive驱动
ln -s /server/hive/lib/mysql-connector-java-8.0.17.jar /server/sqoop/lib/mysql-connector-java.jar配置环境变量(可选)vi /etc/profile
export SQOOP_HOME=/server/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
source /etc/profile使用介绍[root@node1 bin]# sqoop help
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/server/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-10-13 02:59:12,286 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.由上可知命令格式:sqoop COMMAND [ARGS]参数介绍将mysql中的hive数据库中的consumer表通过sqoop导入到HDFS中
#(例子1)【将mysql中的hive数据库中的consumer表通过sqoop导入到HDFS中】
#建表
mysql>create table consumer(
id int unsigned NOT NULL AUTO_INCREMENT,
name varchar(32),
sex varchar(32),
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
#插入数据
insert into consumer values(null,'xiaoming迷宫','man');
insert into consumer values(null,'leyan','man');
insert into consumer values(null,'theShy','man');
insert into consumer values(null,'theshygirlfriend','woman');
#开始sqoop导入
sqoop import --connect jdbc:mysql://node1:3306/hive --table consumer --username root --password root123456 --target-dir /sqoopout/ --fields-terminated-by '\t' -m 1;数据
#user.txt
字段 属性 备注
uname varchar 用户名
vnum int 视频数
fnum int 朋友数 mysql>create database sqoop;
mysql>use sqoop;
mysql>create table user(
uname varchar(32),
vnum varchar(32),
fnum varchar(100)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; hdfs dfs -put user.txt hdfs://node1:9000/input/sqoop export \
--connect 'jdbc:mysql://node1:3306/sqoop' \
--username 'root' \
--password 'root123456' \
--table 'user' \
--export-dir '/input/user.txt' \
--columns 'uname,vnum,fnum' \
--mapreduce-job-name 'hdfs to mysql' \
--input-fields-terminated-by '\t' \
--input-lines-terminated-by '\n';抽取Mysql数据到Hive将mysql表consumer中的数据导入hive的consumer_hive表,hive的consumer表不存在
#---(案例1)---
#需求: 将mysql表consumer中的数据导入hive的consumer_hive表,hive的consumer表不存在。
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.mysql_to_hive_consumer --hive-import -m 1 --delete-target-dir ;
#mysql的数据
mysql> select * from consumer;
+----+------------------+-------+
| id | name | sex |
+----+------------------+-------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
+----+------------------+-------+
#导入结果
jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
+----------------------------+------------------------------+-----------------------------+
在(案例1)的基础上,分别进行overwrite(覆盖)导入和into(直接加入)导入。
#---(案例2)---
#需求: 在(案例1)的基础上,分别进行overwrite(覆盖)导入和into(直接加入)导入。
into: 命令同(案例1)
overwrite:
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.mysql_to_hive_consumer --hive-import -m 1 --hive-overwrite;
#修改mysql
mysql> update consumer set sex='woman,man' where id=23;
mysql> update consumer set sex='man,workman' where id=22;
mysql> update consumer set sex='man,workman,iconman' where id=21;
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
+----+------------------+---------------------+
#原来的hive表
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
+----------------------------+------------------------------+-----------------------------+
#执行覆盖语句后的hive表
jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
+----------------------------+------------------------------+-----------------------------+
在(案例2)的基础上,通过增加mysql的consumer表数据,增量导入到hive表中。
#---(案例3)---
#需求: 在(案例2)的基础上,通过增加mysql的consumer表数据,增量导入到hive表中。
#增加mysql的consumer表数据
insert into consumer values(null,'世纪大道东','man');
insert into consumer values(null,'皇后大道','woman');
insert into consumer values(null,'sky','man');
insert into consumer values(null,'OhYesYes','woman');
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----+------------------+---------------------+
#hive增量插入前的数据
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
+----------------------------+------------------------------+-----------------------------+
#执行增量导入语句
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --where "id>23" --hive-table hive.mysql_to_hive_consumer --hive-import -m 1;
#执行结果
jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----------------------------+------------------------------+-----------------------------+将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。
#---(案例4)---
#需求: 将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。
#consumer表的数据
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----+------------------+---------------------+
#创建hivetest2表
jdbc:hive2://node1:10000>create table hivetest2(id int,name string,sex string) row format delimited fields terminated by ',';
#执行导入数据命令
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.hivetest2 --hive-import -m 1 --fields-terminated-by ",";
#执行结果
jdbc:hive2://node1:10000> select * from hivetest2;
+---------------+-------------------+----------------+
| hivetest2.id | hivetest2.name | hivetest2.sex |
+---------------+-------------------+----------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+---------------+-------------------+----------------+
增量把mysql的数据导入到hdfs中
#---(案例5)---
#需求: 在案例4的基础上,把mysql的数据导入数据到hdfs中。
#consumer表的数据
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----+------------------+---------------------+
#执行导入命令
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --target-dir /sqooptxt/consumer_mysql.txt -m 1;
导出前
#需求: 在案例5的基础上,【增量】把mysql的数据导入数据到hdfs中。 #consumer表的数据 insert into consumer values(null,'手机定位','woman'); insert into consumer values(null,'憨憨','woman'); insert into consumer values(null,'杰尼龟','man'); insert into consumer values(null,'杰尼龟头大','man'); mysql> select * from consumer; +----+------------------+---------------------+ | id | name | sex | +----+------------------+---------------------+ | 20 | xiaoming迷宫 | man | | 21 | leyan | man,workman,iconman | | 22 | theShy | man,workman | | 23 | theshygirlfriend | woman,man | | 24 | 世纪大道东 | man | | 25 | 皇后大道 | woman | | 26 | sky | man | | 27 | OhYesYes | woman | | 28 | 手机定位 | woman | | 29 | 憨憨 | woman | | 30 | 杰尼龟 | man | | 31 | 杰尼龟头大 | man | +----+------------------+---------------------+ #执行增量导入命令 sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --target-dir /input/consumer_mysql.txt -m 1 --check-column id --incremental append --last-value 20; #执行结果抽取Hive数据到Mysql#mysql的consumer的数据 mysql> truncate table consumer; Query OK, 0 rows affected (0.01 sec) mysql> insert into consumer values(24,'我不是世纪大道','woman'); mysql> insert into consumer values(1000,'杰尼龟,头大1000号','man'); mysql> select * from consumer; +------+-------------------------+-------+ | id | name | sex | +------+-------------------------+-------+ | 24 | 我不是世纪大道 | woman | | 1000 | 杰尼龟,头大1000号 | man | +------+-------------------------+--------+ #hive的表数据 jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer; +----------------------------+------------------------------+-----------------------------+ | mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex | +----------------------------+------------------------------+-----------------------------+ | 20 | xiaoming迷宫 | man | | 21 | leyan | man,workman,iconman | | 22 | theShy | man,workman | | 23 | theshygirlfriend | woman,man | | 24 | 世纪大道东 | man | | 25 | 皇后大道 | woman | | 26 | sky | man | | 27 | OhYesYes | woman | +----------------------------+------------------------------+-----------------------------+ #hive的表结构 jdbc:hive2://node1:10000> show create table mysql_to_hive_consumer; +-------------------------------------------------------------------------------+ | createtab_stmt | +-------------------------------------------------------------------------------+ | CREATE TABLE `mysql_to_hive_consumer`( | | `id` bigint, | | `name` string, | | `sex` string) | | COMMENT 'Imported by sqoop on 2019/10/13 19:34:45' | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'='•', | | 'line.delim'='\n', | | 'serialization.format'='•') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://node1:9000/user/hive/warehouse/hive.db/mysql_to_hive_consumer' | | TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'transient_lastDdlTime'='1570979547') | +-------------------------------------------------------------------------------+ #执行抽取语句 #hivetest表分隔方式是'\n',但是export命令默认使用','分隔数据 sqoop export --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --export-dir hdfs://node1:9000/user/hive/warehouse/hive.db/mysql_to_hive_consumer --input-fields-terminated-by "\\01" --update-mode allowinsert --update-key id; #执行结果 mysql> select * from consumer; +------+-------------------------+---------------------+ | id | name | sex | +------+-------------------------+---------------------+ | 20 | xiaoming迷宫 | man | | 21 | leyan | man,workman,iconman | | 22 | theShy | man,workman | | 23 | theshygirlfriend | woman,man | | 24 | 世纪大道东 | man | | 25 | 皇后大道 | woman | | 26 | sky | man | | 27 | OhYesYes | woman | | 1000 | 杰尼龟,头大1000号 | man | +------+-------------------------+---------------------+sqoop从mysql到hive实现定时增量导入#hive的数据表列表信息 jdbc:hive2://node1:10000> use hive; jdbc:hive2://node1:10000> show tables; +-------------------------+ | tab_name | +-------------------------+ | hivetest2 | | mysql_to_hive_consumer | +-------------------------+ #mysql第一次所有的数据 mysql> SELECT d.id,d.name,d.sex FROM hive.consumer d; +------+-------------------------+---------------------+ | id | name | sex | +------+-------------------------+---------------------+ | 20 | xiaoming迷宫 | man | | 21 | leyan | man,workman,iconman | | 22 | theShy | man,workman | | 23 | theshygirlfriend | woman,man | | 24 | 世纪大道东 | man | | 25 | 皇后大道 | woman | | 26 | sky | man | | 27 | OhYesYes | woman | | 1000 | 杰尼龟,头大1000号 | man | +------+-------------------------+---------------------+ #(1)第一次全量抽取,并创建hive表 sqoop import \ --connect jdbc:mysql://node1:3306/hive \ --username root \ --password root123456 \ --direct \ --delete-target-dir \ --target-dir /user/hive/warehouse/hive.db/clocking_update_consumer \ --hive-import \ -m 2 \ --hive-database hive \ --hive-table clocking_update_consumer \ --create-hive-table \ --query 'select id,name,sex from consumer where $CONDITIONS' \ --split-by id; #执行结果 +------------------------------+--------------------------------+-------------------------------+ | clocking_update_consumer.id | clocking_update_consumer.name | clocking_update_consumer.sex | +------------------------------+--------------------------------+-------------------------------+ | 20 | xiaoming迷宫 | man | | 21 | leyan | man,workman,iconman | | 22 | theShy | man,workman | | 23 | theshygirlfriend | woman,man | | 24 | 世纪大道东 | man | | 25 | 皇后大道 | woman | | 26 | sky | man | | 27 | OhYesYes | woman | | 1000 | 杰尼龟,头大1000号 | man | +------------------------------+--------------------------------+-------------------------------+ #(2)创建增量抽取的job #给mysql的表添加数据(改变数据) mysql>insert into consumer values(999,'卡咪龟','man'); mysql>insert into consumer values(998,'木木枭','woman'); mysql>insert into consumer values(1001,'木木枭02','woman'); mysql> update consumer set name='sky_changed被改了' where id=26; mysql> select * from consumer; +-------+-------------------------+---------------------+ | id | name | sex | +-------+-------------------------+---------------------+ | 20 | xiaoming迷宫 | man | | 21 | leyan | man,workman,iconman | | 22 | theShy | man,workman | | 23 | theshygirlfriend | woman,man | | 24 | 世纪大道东 | man | | 25 | 皇后大道 | woman | | 26 | sky_changed被改了 | man | | 27 | OhYesYes | woman | | 998 | 木木枭 | woman | | 999 | 卡咪龟 | man | | 1000 | 杰尼龟,头大1000号 | man | | 1001 | 木木枭02 | woman | | 10001 | 木木枭2 | man | +-------+-------------------------+---------------------+ #执行创建增量添加的job sqoop job --create clocking_update_consumer_job -- import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --direct --target-dir /user/hive/warehouse/hive.db/clocking_update_consumer --delete-target-dir --hive-import --hive-database hive --hive-table clocking_update_consumer --query 'select id,name,sex from consumer where $CONDITIONS' --split-by 'id' --incremental append --check-column id --last-value 12; #说明: 增量抽取,需要指定`--incremental append`,同时指定按照源表中哪个字段进行增量 `--check-column id`,并指定hive表中pk当前最大值`--last-value 1000`。 创建sqoop job的目的是,每次执行job以后,sqoop会自动记录pk的last-value, 下次再执行时,就会自动指定last-value,不需要手工去改了。 #怎么查询 sqoop 的 job 每次用到的 last-value 的值? (1)查询表并获取最后一列的最大值。 (2)在sqoop中创建一个作业并将该列设置为增量列并继续前进,您的作业将逐步运行 ⣠⣶⡾⠏⠉⠙⠳⢦⡀⠀⠀⠀ ⢠⠞⠉⠙⠲⡀ ⠀⠀⠀⣴⠿⠏⠀⠀⠀⠀⠀⠀ ⢳⡀⠀ ⡏⠀⠀啊 ⢷ ⠀⠀⢠⣟⣋⡀⢀⣀⣀⡀⠀⣀⡀⣧⠀ ⢸⠀挺好 ⡇ ⠀⠀⢸⣯⡭⠁⠸⣛⣟⠆⡴⣻⡲⣿⠀ ⣸⠀的呢 ⡇ ⠀⠀⣟⣿⡭⠀⠀⠀⠀⠀⢱⠀⠀⣿⠀ ⢹⠀⠀ ⡇ ⠀⠀⠙⢿⣯⠄⠀⠀⠀⢀⡀⠀⠀⡿⠀⠀ ⡇⠀ ⡼ ⠀⠀⠀⠀⠹⣶⠆⠀⠀⠀⠀⠀⡴⠃⠀⠀ ⠘⠤⣄⣠⠞⠀ ⠀⠀⠀⠀⠀⢸⣷⡦⢤⡤⢤⣞⣁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⢀⣤⣴⣿⣏⠁⠀⠀⠸⣏⢯⣷⣖⣦⡀⠀⠀⠀⠀⠀⠀ ⢀⣾⣽⣿⣿⣿⣿⠛⢲⣶⣾⢉⡷⣿⣿⠵⣿⠀⠀⠀⠀⠀⠀ ⣼⣿⠍⠉⣿⡭⠉⠙⢺⣇⣼⡏⠀⠀⠀ ⣄⢸⠀⠀⠀⠀⠀⠀ ⣿⣿⣧⣀⣿.........⣀⣰⣏⣘⣆⣀⠀⠀ (1)#/server/hadoop/etc/hadoop/mampred-site.xml报错 [2019-10-13 08:08:46.417]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Please check whether your etc/hadoop/mapred-site.xml contains the below configuration: For more detailed output, check the application tracking page: http://node1:8088/cluster/app/application_1570906081976_0002 Then click on links to logs of each attempt. . Failing the application. 2019-10-13 08:08:13,627 INFO mapreduce.Job: Counters: 0 2019-10-13 08:08:13,634 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 2019-10-13 08:08:13,636 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 4.5729 seconds (0 bytes/sec) 2019-10-13 08:08:13,642 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2019-10-13 08:08:13,643 INFO mapreduce.ImportJobBase: Retrieved 0 records. 2019-10-13 08:08:13,643 ERROR tool.ImportTool: Import failed: Import job failed! #解决方法 (2)#sqoop 报 Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR 解决方法 将hive 里面的lib下的hive-exec-**.jar 放到sqoop 的lib 下可以解决以下问题。 (3)#字段分割不了 原因: 是user.txt文件的数据格式不正确 mysql> select * from user; +-------------------------------------+------+------+ | uname | vnum | fnum | +-------------------------------------+------+------+ | fnum int 朋友数 | NULL | NULL | | vnum int 视频数 | NULL | NULL | | 字段 属性 备注 | NULL | NULL | | uname varchar 用户名 | NULL | NULL | +-------------------------------------+------+------+ 解决方法:user.txt每个字段后都要有一个tab空格,包括例如‘用户名’后也要 字段 属性 备注 uname varchar 用户名 vnum int 视频数 fnum int 朋友数 (4)#mysql->hive表,但是hive表建立了,但是没有数据 因为使用了`--hive-overwrite ` ,他的使用最好是表存在的情况,去掉就好 (5)#ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at org.json.JSONObject. 原因:sqoop没有找到HCAT_HOME 在sqoop/conf文件夹下执行 cp sqoop-env-template.sh sqoop-env.sh。 然后,修改sqoop-env.sh,加上: export HCAT_HOME=/server/hive/hcatalog (6)#2019-10-16 17:44:47,283 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at org.json.JSONObject. at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43) at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785) at 。。。。 解决方法: #!!!待解决!!!导入: java-josn.jar包(未生效!) 继续 作业记住用于指定作业的参数,因此可以通过调用作业的句柄重新执行这些参数。如果将保存的作业配置为执行增量导入,则有关最近导入的行的状态将在保存的作业中更新,以允许作业仅持续导入最新的行。#保存常用的作业,以便下次快速调用 -create -delete -exec -show -list 列出所有的job创建job sqoop job -create myjob - import -connect jdbc:mysql://example.com/db ... -table mytable创建非增量job#将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。 sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.hivetest2 --hive-import -m 1 --fields-terminated-by ","; #把上面语句修改成job执行 sqoop job --create hivetest2job -- import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-import --hive-table hive.hivetest2 --fields-terminated-by "," -m 1; ------------------------------------------------------------------- sqoop import --connect 'jdbc:mysql://172.17.1.103:3306/test?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull' --username root --password 123456 -table incr_test -hive-database test -hive-table test_incr_test_20171130 -hive-import -m -1 --incremental append --check-column time_update --last-value "2017-12-01" 1; CREATE TABLE `consumer` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `age` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `sex` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci 列出所有job sqoop job -list查看job sqoop job -show myjob查看结果Job: clocking_update_consumer_job Tool: import Options: ---------------------------- verbose = false hcatalog.drop.and.create.table = false codegen.output.delimiters.escape = 0 codegen.output.delimiters.enclose.required = false codegen.input.delimiters.field = 0 split.limit = null hbase.create.table = false mainframe.input.dataset.type = p db.require.password = false skip.dist.cache = false hdfs.append.dir = false codegen.input.delimiters.escape = 0 import.fetch.size = null accumulo.create.table = false codegen.input.delimiters.enclose.required = false reset.onemapper = false codegen.output.delimiters.record = 10 import.max.inline.lob.size = 16777216 sqoop.throwOnError = false hbase.bulk.load.enabled = false hcatalog.create.table = false db.clear.staging.table = false codegen.input.delimiters.record = 0 enable.compression = false hive.overwrite.table = false hive.import = false codegen.input.delimiters.enclose = 0 accumulo.batch.size = 10240000 hive.drop.delims = false customtool.options.jsonmap = {} codegen.output.delimiters.enclose = 0 hdfs.delete-target.dir = false codegen.output.dir = . codegen.auto.compile.dir = true relaxed.isolation = false mapreduce.num.mappers = 4 accumulo.max.latency = 5000 import.direct.split.size = 0 sqlconnection.metadata.transaction.isolation.level = 2 codegen.output.delimiters.field = 44 export.new.update = UpdateOnly incremental.mode = None hdfs.file.format = TextFile sqoop.oracle.escaping.disabled = true codegen.compile.dir = /tmp/sqoop-root/compile/054b4b106ff93dfa99cc8994f47e1524 direct.import = false temporary.dirRoot = _sqoop hive.fail.table.exists = false db.batch = false执行job sqoop job -exec myjob重写参数 sqoop job -exec myjob -username someuser -P感悟解决不了的问题,可以先跳过,说明了现在的能力还不够解决 2019年10月21日18:14:27 回首这句话,后来学习了Datax,确实发现是数据类型的问题,解决了sqoop创建job的问题,并且也爱上了Datax 收藏 淘帖0 2 踩0 回复 只看该作者 4 个回复 倒序浏览 夜听雨 高级黑马 2019-10-22 10:29:37 | 只看该作者 沙发 回帖奖励 +2 学习要加,骄傲要减,机会要乘,懒惰要除。诶,激励自己的名言,希望对你有用 回复 使用道具 举报 1 0 西早boy 中级黑马 2019-10-22 21:00:09 | 只看该作者 藤椅 夜听雨 发表于 2019-10-22 10:29 学习要加,骄傲要减,机会要乘,懒惰要除。诶,激励自己的名言,希望对你有用 ... 首先非常感激您的提醒和叮嘱! 在经历过再次学习带来的担忧与惶恐后。在身体逐渐恢复的过程中,会有一些让人感觉奇特的行为(上课途中离场,突然旷课),我承认这些是不好的习惯,但是!请放心,我的目标明确,我的步伐坚定并且稳重,我的目光明亮并且清晰,或许我不是最优秀,不过还是会热充于帮助同学,如果是发现了解决不了的问题那也是发现并修补自己短板的过程。 欢迎您见证我成长的过程: https://www.qinpeizhou.online 回复 使用道具 举报 西早boy 中级黑马 2019-10-22 21:07:54 | 只看该作者 板凳 夜听雨 发表于 2019-10-22 10:29 学习要加,骄傲要减,机会要乘,懒惰要除。诶,激励自己的名言,希望对你有用 ... 非常感激您的叮嘱与提醒!我由于身体还在恢复过程中,所以难免会有一些异于常人的行为。但是!请放心,我的心如磐石并且坚强,目光清晰而坚定,步伐沉稳并且踏实, 所以我还是希望您可以见证我的成长: 我的学习笔记博客: https://www.qinpeizhou.online 回复 使用道具 举报 柠檬leung不酸 黑马粉丝团 2019-11-13 10:22:57 | 只看该作者 报纸 “有人说...学习是为了完善人生,而非享乐人生,追求卓越,成功就会在不经意间追上你……” 回复 使用道具 举报 夜听雨 高级黑马 2019-11-21 09:33:39 | 只看该作者 地板 面对生活的选择,可怕的不是“大环境”,而是我们自己。勇于走出自己的舒适圈,勇于挑战自己,加油…… 回复 使用道具 举报 新人专属 来到黑马社区就是交流学习哒~邮箱验证之后,记得先领个新人专属章吧~ QQ达人 使用QQ帐号登录论坛的用户 高级模式 B Color Image Link Quote Code Smilies 您需要登录后才可以回帖 登录 | 加入黑马 手机登录 发表回复