[AI大数据开发]Sqoop总结和踩坑

2025-08-28 00:37:04

[AI大数据开发]

Sqoop总结和踩坑

© 西早boy 中级黑马

/ 2019-10-21 18:27 / 5217 人查看 / 4 人回复 /

0 人收藏

转载请遵从CC协议

禁止商业使用本文

Sqoop安装环境Centos7.6 + Hadoop 3.2.0 + Hive3 + sqoop-1.4.7.bin__hadoop-2.6.0【sqoop-部署】快速入门(安装与使用)Sqoop架构以及应用介绍

Sqoop架构介绍 Sqoop在业务当中的实际地位 一个流程图介绍Hadoop业务的开发流程以及Sqoop在业务当中的实际地位。如上图所示:在实际的业务当中,我们首先对原始数据集通过MapReduce进行数据清洗,然后将清洗后的数据存入到Hbase数据库中,而后通过数据仓库Hive对Hbase中的数据进行统计与分析,分析之后将分析结果存入到Hive表中,然后通过Sqoop这个工具将我们的数据挖掘结果导入到MySql数据库中,最后通过Web将结果展示给客户。Sqoop的概念 Sqoop:SQL–to–Hadoop

正如Sqoop的名字所示:Sqoop是一个用来将关系型数据库和Hadoop中的数据进行相互转移的工具,

可以将一个关系型数据库(例如Mysql、Oracle)中的数据导入到Hadoop(例如HDFS、Hive、Hbase)中,

也可以将Hadoop(例如HDFS、Hive、Hbase)中的数据导入到关系型数据库(例如Mysql、Oracle)中。如下图所示:[size=0.9em]Sqoop收到客户端的shell命令或者Java api命令后,通过Sqoop中的任务翻译器 (Task Translator)

将命令转换为对应的MapReduce任务,而后将关系型数据库和Hadoop中的数据进行相互转移,进而完成数

据的拷贝。 Sqoop注意把HDFS、Hive、HBase中的数据导出到MySQL、Oracle等数据库中。

1.4 为sqoop1, 1.9 为sqoop2 ,sqoop1与sqoop2是不兼容的。

下载安装sqoop download说明:sqoop对应hadoop的版本要求不大,hadoop2.x与hadoop3.x都可以使用wget http://mirror.bit.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.tar.gz

tar -zxvf sqoop-1.4.7.tar.gz

mv sqoop-1.4.7.tar.gz sqoop修改配置cd sqoop/conf/

cp sqoop-env-template.sh sqoop-env.sh

vim sqoop-env.sh

#添加

export HADOOP_COMMON_HOME=/server/hadoop

export HADOOP_MAPRED_HOME=/server/hadoop

export HIVE_HOME=/server/hive

export ZOOCFGDIR=/server/hive/hcatalog

​ vi /opt/apps/sqoop/bin/configure-sqoop

## Moved to be a runtime check in sqoop.

#if [ ! -d "${HCAT_HOME}" ]; then

# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."

# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'

#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then

# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."

# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'

#fi

# Add HCatalog to dependency list

#if [ -e "${HCAT_HOME}/bin/hcat" ]; then

# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`

# if [ -z "${HIVE_CONF_DIR}" ]; then

# TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}

# fi

# SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}

#fi

# Add Accumulo to dependency list

#if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then

# for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do

# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn

# done

# for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do

# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn

# done

#fi添加驱动驱动下载地址将驱动包添加到lib/目录下#没有mysql驱动,安装sql驱动

wget -P /opt/apps/sqoop/lib/ http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.40/mysql-connector-java-5.1.40.jar

#建议hadoop集群的mysql驱动统一,所以这里使用的是之前安装的hive驱动

ln -s /server/hive/lib/mysql-connector-java-8.0.17.jar /server/sqoop/lib/mysql-connector-java.jar配置环境变量(可选)vi /etc/profile

export SQOOP_HOME=/server/sqoop

export PATH=$PATH:$SQOOP_HOME/bin

source /etc/profile使用介绍[root@node1 bin]# sqoop help

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/server/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

2019-10-13 02:59:12,286 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

usage: sqoop COMMAND [ARGS]

Available commands:

codegen Generate code to interact with database records

create-hive-table Import a table definition into Hive

eval Evaluate a SQL statement and display the results

export Export an HDFS directory to a database table

help List available commands

import Import a table from a database to HDFS

import-all-tables Import tables from a database to HDFS

import-mainframe Import datasets from a mainframe server to HDFS

job Work with saved jobs

list-databases List available databases on a server

list-tables List available tables in a database

merge Merge results of incremental imports

metastore Run a standalone Sqoop metastore

version Display version information

See 'sqoop help COMMAND' for information on a specific command.由上可知命令格式:sqoop COMMAND [ARGS]参数介绍将mysql中的hive数据库中的consumer表通过sqoop导入到HDFS中

#(例子1)【将mysql中的hive数据库中的consumer表通过sqoop导入到HDFS中】

#建表

mysql>create table consumer(

id int unsigned NOT NULL AUTO_INCREMENT,

name varchar(32),

sex varchar(32),

PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

#插入数据

insert into consumer values(null,'xiaoming迷宫','man');

insert into consumer values(null,'leyan','man');

insert into consumer values(null,'theShy','man');

insert into consumer values(null,'theshygirlfriend','woman');

#开始sqoop导入

sqoop import --connect jdbc:mysql://node1:3306/hive --table consumer --username root --password root123456 --target-dir /sqoopout/ --fields-terminated-by '\t' -m 1;数据

#user.txt

字段 属性 备注

uname varchar 用户名

vnum int 视频数

fnum int 朋友数 mysql>create database sqoop;

mysql>use sqoop;

mysql>create table user(

uname varchar(32),

vnum varchar(32),

fnum varchar(100)

) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; hdfs dfs -put user.txt hdfs://node1:9000/input/sqoop export \

--connect 'jdbc:mysql://node1:3306/sqoop' \

--username 'root' \

--password 'root123456' \

--table 'user' \

--export-dir '/input/user.txt' \

--columns 'uname,vnum,fnum' \

--mapreduce-job-name 'hdfs to mysql' \

--input-fields-terminated-by '\t' \

--input-lines-terminated-by '\n';抽取Mysql数据到Hive将mysql表consumer中的数据导入hive的consumer_hive表,hive的consumer表不存在

#---(案例1)---

#需求: 将mysql表consumer中的数据导入hive的consumer_hive表,hive的consumer表不存在。

sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.mysql_to_hive_consumer --hive-import -m 1 --delete-target-dir ;

#mysql的数据

mysql> select * from consumer;

+----+------------------+-------+

| id | name | sex |

+----+------------------+-------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man |

| 22 | theShy | man |

| 23 | theshygirlfriend | woman |

+----+------------------+-------+

#导入结果

jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;

+----------------------------+------------------------------+-----------------------------+

| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |

+----------------------------+------------------------------+-----------------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man |

| 22 | theShy | man |

| 23 | theshygirlfriend | woman |

+----------------------------+------------------------------+-----------------------------+

​在(案例1)的基础上,分别进行overwrite(覆盖)导入和into(直接加入)导入。

#---(案例2)---

#需求: 在(案例1)的基础上,分别进行overwrite(覆盖)导入和into(直接加入)导入。

into: 命令同(案例1)

overwrite:

sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.mysql_to_hive_consumer --hive-import -m 1 --hive-overwrite;

#修改mysql

mysql> update consumer set sex='woman,man' where id=23;

mysql> update consumer set sex='man,workman' where id=22;

mysql> update consumer set sex='man,workman,iconman' where id=21;

mysql> select * from consumer;

+----+------------------+---------------------+

| id | name | sex |

+----+------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

+----+------------------+---------------------+

#原来的hive表

+----------------------------+------------------------------+-----------------------------+

| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |

+----------------------------+------------------------------+-----------------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man |

| 22 | theShy | man |

| 23 | theshygirlfriend | woman |

+----------------------------+------------------------------+-----------------------------+

#执行覆盖语句后的hive表

jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;

+----------------------------+------------------------------+-----------------------------+

| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |

+----------------------------+------------------------------+-----------------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

+----------------------------+------------------------------+-----------------------------+

​在(案例2)的基础上,通过增加mysql的consumer表数据,增量导入到hive表中。

#---(案例3)---

#需求: 在(案例2)的基础上,通过增加mysql的consumer表数据,增量导入到hive表中。

#增加mysql的consumer表数据

insert into consumer values(null,'世纪大道东','man');

insert into consumer values(null,'皇后大道','woman');

insert into consumer values(null,'sky','man');

insert into consumer values(null,'OhYesYes','woman');

mysql> select * from consumer;

+----+------------------+---------------------+

| id | name | sex |

+----+------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

+----+------------------+---------------------+

#hive增量插入前的数据

+----------------------------+------------------------------+-----------------------------+

| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |

+----------------------------+------------------------------+-----------------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

+----------------------------+------------------------------+-----------------------------+

#执行增量导入语句

sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --where "id>23" --hive-table hive.mysql_to_hive_consumer --hive-import -m 1;

#执行结果

jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;

+----------------------------+------------------------------+-----------------------------+

| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |

+----------------------------+------------------------------+-----------------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

+----------------------------+------------------------------+-----------------------------+将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。

#---(案例4)---

#需求: 将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。

#consumer表的数据

mysql> select * from consumer;

+----+------------------+---------------------+

| id | name | sex |

+----+------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

+----+------------------+---------------------+

#创建hivetest2表

jdbc:hive2://node1:10000>create table hivetest2(id int,name string,sex string) row format delimited fields terminated by ',';

#执行导入数据命令

sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.hivetest2 --hive-import -m 1 --fields-terminated-by ",";

#执行结果

jdbc:hive2://node1:10000> select * from hivetest2;

+---------------+-------------------+----------------+

| hivetest2.id | hivetest2.name | hivetest2.sex |

+---------------+-------------------+----------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man |

| 22 | theShy | man |

| 23 | theshygirlfriend | woman |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

+---------------+-------------------+----------------+

​增量把mysql的数据导入到hdfs中

#---(案例5)---

#需求: 在案例4的基础上,把mysql的数据导入数据到hdfs中。

#consumer表的数据

mysql> select * from consumer;

+----+------------------+---------------------+

| id | name | sex |

+----+------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

+----+------------------+---------------------+

#执行导入命令

sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --target-dir /sqooptxt/consumer_mysql.txt -m 1;

​导出前

#需求: 在案例5的基础上,【增量】把mysql的数据导入数据到hdfs中。

#consumer表的数据

insert into consumer values(null,'手机定位','woman');

insert into consumer values(null,'憨憨','woman');

insert into consumer values(null,'杰尼龟','man');

insert into consumer values(null,'杰尼龟头大','man');

mysql> select * from consumer;

+----+------------------+---------------------+

| id | name | sex |

+----+------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

| 28 | 手机定位 | woman |

| 29 | 憨憨 | woman |

| 30 | 杰尼龟 | man |

| 31 | 杰尼龟头大 | man |

+----+------------------+---------------------+

#执行增量导入命令

sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --target-dir /input/consumer_mysql.txt -m 1 --check-column id --incremental append --last-value 20;

#执行结果抽取Hive数据到Mysql#mysql的consumer的数据

mysql> truncate table consumer;

Query OK, 0 rows affected (0.01 sec)

mysql> insert into consumer values(24,'我不是世纪大道','woman');

mysql> insert into consumer values(1000,'杰尼龟,头大1000号','man');

mysql> select * from consumer;

+------+-------------------------+-------+

| id | name | sex |

+------+-------------------------+-------+

| 24 | 我不是世纪大道 | woman |

| 1000 | 杰尼龟,头大1000号 | man |

+------+-------------------------+--------+

#hive的表数据

jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;

+----------------------------+------------------------------+-----------------------------+

| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |

+----------------------------+------------------------------+-----------------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

+----------------------------+------------------------------+-----------------------------+

#hive的表结构

jdbc:hive2://node1:10000> show create table mysql_to_hive_consumer;

+-------------------------------------------------------------------------------+

| createtab_stmt |

+-------------------------------------------------------------------------------+

| CREATE TABLE `mysql_to_hive_consumer`( |

| `id` bigint, |

| `name` string, |

| `sex` string) |

| COMMENT 'Imported by sqoop on 2019/10/13 19:34:45' |

| ROW FORMAT SERDE |

| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |

| WITH SERDEPROPERTIES ( |

| 'field.delim'='•', |

| 'line.delim'='\n', |

| 'serialization.format'='•') |

| STORED AS INPUTFORMAT |

| 'org.apache.hadoop.mapred.TextInputFormat' |

| OUTPUTFORMAT |

| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |

| LOCATION |

| 'hdfs://node1:9000/user/hive/warehouse/hive.db/mysql_to_hive_consumer' |

| TBLPROPERTIES ( |

| 'bucketing_version'='2', |

| 'transient_lastDdlTime'='1570979547') |

+-------------------------------------------------------------------------------+

#执行抽取语句

#hivetest表分隔方式是'\n',但是export命令默认使用','分隔数据

sqoop export --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --export-dir hdfs://node1:9000/user/hive/warehouse/hive.db/mysql_to_hive_consumer --input-fields-terminated-by "\\01" --update-mode allowinsert --update-key id;

#执行结果

mysql> select * from consumer;

+------+-------------------------+---------------------+

| id | name | sex |

+------+-------------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

| 1000 | 杰尼龟,头大1000号 | man |

+------+-------------------------+---------------------+sqoop从mysql到hive实现定时增量导入#hive的数据表列表信息

jdbc:hive2://node1:10000> use hive;

jdbc:hive2://node1:10000> show tables;

+-------------------------+

| tab_name |

+-------------------------+

| hivetest2 |

| mysql_to_hive_consumer |

+-------------------------+

#mysql第一次所有的数据

mysql> SELECT d.id,d.name,d.sex FROM hive.consumer d;

+------+-------------------------+---------------------+

| id | name | sex |

+------+-------------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

| 1000 | 杰尼龟,头大1000号 | man |

+------+-------------------------+---------------------+

#(1)第一次全量抽取,并创建hive表

sqoop import \

--connect jdbc:mysql://node1:3306/hive \

--username root \

--password root123456 \

--direct \

--delete-target-dir \

--target-dir /user/hive/warehouse/hive.db/clocking_update_consumer \

--hive-import \

-m 2 \

--hive-database hive \

--hive-table clocking_update_consumer \

--create-hive-table \

--query 'select id,name,sex from consumer where $CONDITIONS' \

--split-by id;

#执行结果

+------------------------------+--------------------------------+-------------------------------+

| clocking_update_consumer.id | clocking_update_consumer.name | clocking_update_consumer.sex |

+------------------------------+--------------------------------+-------------------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky | man |

| 27 | OhYesYes | woman |

| 1000 | 杰尼龟,头大1000号 | man |

+------------------------------+--------------------------------+-------------------------------+

#(2)创建增量抽取的job

#给mysql的表添加数据(改变数据)

mysql>insert into consumer values(999,'卡咪龟','man');

mysql>insert into consumer values(998,'木木枭','woman');

mysql>insert into consumer values(1001,'木木枭02','woman');

mysql> update consumer set name='sky_changed被改了' where id=26;

mysql> select * from consumer;

+-------+-------------------------+---------------------+

| id | name | sex |

+-------+-------------------------+---------------------+

| 20 | xiaoming迷宫 | man |

| 21 | leyan | man,workman,iconman |

| 22 | theShy | man,workman |

| 23 | theshygirlfriend | woman,man |

| 24 | 世纪大道东 | man |

| 25 | 皇后大道 | woman |

| 26 | sky_changed被改了 | man |

| 27 | OhYesYes | woman |

| 998 | 木木枭 | woman |

| 999 | 卡咪龟 | man |

| 1000 | 杰尼龟,头大1000号 | man |

| 1001 | 木木枭02 | woman |

| 10001 | 木木枭2 | man |

+-------+-------------------------+---------------------+

#执行创建增量添加的job

sqoop job --create clocking_update_consumer_job -- import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --direct --target-dir /user/hive/warehouse/hive.db/clocking_update_consumer --delete-target-dir --hive-import --hive-database hive --hive-table clocking_update_consumer --query 'select id,name,sex from consumer where $CONDITIONS' --split-by 'id' --incremental append --check-column id --last-value 12;

#说明:

增量抽取,需要指定`--incremental append`,同时指定按照源表中哪个字段进行增量

`--check-column id`,并指定hive表中pk当前最大值`--last-value 1000`。

创建sqoop job的目的是,每次执行job以后,sqoop会自动记录pk的last-value,

下次再执行时,就会自动指定last-value,不需要手工去改了。

#怎么查询 sqoop 的 job 每次用到的 last-value 的值?

(1)查询表并获取最后一列的最大值。

(2)在sqoop中创建一个作业并将该列设置为增量列并继续前进,您的作业将逐步运行

⣠⣶⡾⠏⠉⠙⠳⢦⡀⠀⠀⠀ ⢠⠞⠉⠙⠲⡀

⠀⠀⠀⣴⠿⠏⠀⠀⠀⠀⠀⠀ ⢳⡀⠀ ⡏⠀⠀啊 ⢷

⠀⠀⢠⣟⣋⡀⢀⣀⣀⡀⠀⣀⡀⣧⠀ ⢸⠀挺好 ⡇

⠀⠀⢸⣯⡭⠁⠸⣛⣟⠆⡴⣻⡲⣿⠀ ⣸⠀的呢 ⡇

⠀⠀⣟⣿⡭⠀⠀⠀⠀⠀⢱⠀⠀⣿⠀ ⢹⠀⠀ ⡇

⠀⠀⠙⢿⣯⠄⠀⠀⠀⢀⡀⠀⠀⡿⠀⠀ ⡇⠀ ⡼

⠀⠀⠀⠀⠹⣶⠆⠀⠀⠀⠀⠀⡴⠃⠀⠀ ⠘⠤⣄⣠⠞⠀

⠀⠀⠀⠀⠀⢸⣷⡦⢤⡤⢤⣞⣁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

⠀⠀⢀⣤⣴⣿⣏⠁⠀⠀⠸⣏⢯⣷⣖⣦⡀⠀⠀⠀⠀⠀⠀

⢀⣾⣽⣿⣿⣿⣿⠛⢲⣶⣾⢉⡷⣿⣿⠵⣿⠀⠀⠀⠀⠀⠀

⣼⣿⠍⠉⣿⡭⠉⠙⢺⣇⣼⡏⠀⠀⠀ ⣄⢸⠀⠀⠀⠀⠀⠀

⣿⣿⣧⣀⣿.........⣀⣰⣏⣘⣆⣀⠀⠀

(1)#/server/hadoop/etc/hadoop/mampred-site.xml报错

[2019-10-13 08:08:46.417]Container exited with a non-zero exit code 1. Error file: prelaunch.err.

Last 4096 bytes of prelaunch.err :

Last 4096 bytes of stderr :

Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:

yarn.app.mapreduce.am.env

HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}

mapreduce.map.env

HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}

mapreduce.reduce.env

HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}

For more detailed output, check the application tracking page: http://node1:8088/cluster/app/application_1570906081976_0002 Then click on links to logs of each attempt.

. Failing the application.

2019-10-13 08:08:13,627 INFO mapreduce.Job: Counters: 0

2019-10-13 08:08:13,634 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead

2019-10-13 08:08:13,636 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 4.5729 seconds (0 bytes/sec)

2019-10-13 08:08:13,642 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

2019-10-13 08:08:13,643 INFO mapreduce.ImportJobBase: Retrieved 0 records.

2019-10-13 08:08:13,643 ERROR tool.ImportTool: Import failed: Import job failed!

#解决方法

yarn.app.mapreduce.am.env

HADOOP_MAPRED_HOME=/server/hadoop

mapreduce.map.env

HADOOP_MAPRED_HOME=/server/hadoop

mapreduce.reduce.env

HADOOP_MAPRED_HOME=/server/hadoop

(2)#sqoop 报 Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR 解决方法

将hive 里面的lib下的hive-exec-**.jar 放到sqoop 的lib 下可以解决以下问题。

(3)#字段分割不了

原因: 是user.txt文件的数据格式不正确

mysql> select * from user;

+-------------------------------------+------+------+

| uname | vnum | fnum |

+-------------------------------------+------+------+

| fnum int 朋友数 | NULL | NULL |

| vnum int 视频数 | NULL | NULL |

| 字段 属性 备注 | NULL | NULL |

| uname varchar 用户名 | NULL | NULL |

+-------------------------------------+------+------+

解决方法:user.txt每个字段后都要有一个tab空格,包括例如‘用户名’后也要

字段 属性 备注

uname varchar 用户名

vnum int 视频数

fnum int 朋友数

(4)#mysql->hive表,但是hive表建立了,但是没有数据

因为使用了`--hive-overwrite ` ,他的使用最好是表存在的情况,去掉就好

(5)#ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException

java.lang.NullPointerException

at org.json.JSONObject.(JSONObject.java:144)

原因:sqoop没有找到HCAT_HOME

在sqoop/conf文件夹下执行 cp sqoop-env-template.sh sqoop-env.sh。

然后,修改sqoop-env.sh,加上:

export HCAT_HOME=/server/hive/hcatalog

(6)#2019-10-16 17:44:47,283 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException

java.lang.NullPointerException

at org.json.JSONObject.(JSONObject.java:144)

at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)

at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)

at 。。。。

解决方法:

#!!!待解决!!!导入: java-josn.jar包(未生效!)

继续

​作业记住用于指定作业的参数,因此可以通过调用作业的句柄重新执行这些参数。如果将保存的作业配置为执行增量导入,则有关最近导入的行的状态将在保存的作业中更新,以允许作业仅持续导入最新的行。#保存常用的作业,以便下次快速调用

-create 创建一个新的job

-delete 删除job

-exec 执行job

-show 显示job的参数

-list 列出所有的job创建job sqoop job -create myjob - import -connect jdbc:mysql://example.com/db ... -table mytable创建非增量job#将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。

sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.hivetest2 --hive-import -m 1 --fields-terminated-by ",";

#把上面语句修改成job执行

sqoop job --create hivetest2job -- import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-import --hive-table hive.hivetest2 --fields-terminated-by "," -m 1;

-------------------------------------------------------------------

sqoop import --connect 'jdbc:mysql://172.17.1.103:3306/test?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull' --username root --password 123456 -table incr_test -hive-database test -hive-table test_incr_test_20171130 -hive-import -m -1 --incremental append --check-column time_update --last-value "2017-12-01"

1;

CREATE TABLE `consumer` (

`id` int(10) unsigned NOT NULL AUTO_INCREMENT,

`age` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,

`sex` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,

PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci 列出所有job sqoop job -list查看job sqoop job -show myjob查看结果Job: clocking_update_consumer_job

Tool: import

Options:

----------------------------

verbose = false

hcatalog.drop.and.create.table = false

codegen.output.delimiters.escape = 0

codegen.output.delimiters.enclose.required = false

codegen.input.delimiters.field = 0

split.limit = null

hbase.create.table = false

mainframe.input.dataset.type = p

db.require.password = false

skip.dist.cache = false

hdfs.append.dir = false

codegen.input.delimiters.escape = 0

import.fetch.size = null

accumulo.create.table = false

codegen.input.delimiters.enclose.required = false

reset.onemapper = false

codegen.output.delimiters.record = 10

import.max.inline.lob.size = 16777216

sqoop.throwOnError = false

hbase.bulk.load.enabled = false

hcatalog.create.table = false

db.clear.staging.table = false

codegen.input.delimiters.record = 0

enable.compression = false

hive.overwrite.table = false

hive.import = false

codegen.input.delimiters.enclose = 0

accumulo.batch.size = 10240000

hive.drop.delims = false

customtool.options.jsonmap = {}

codegen.output.delimiters.enclose = 0

hdfs.delete-target.dir = false

codegen.output.dir = .

codegen.auto.compile.dir = true

relaxed.isolation = false

mapreduce.num.mappers = 4

accumulo.max.latency = 5000

import.direct.split.size = 0

sqlconnection.metadata.transaction.isolation.level = 2

codegen.output.delimiters.field = 44

export.new.update = UpdateOnly

incremental.mode = None

hdfs.file.format = TextFile

sqoop.oracle.escaping.disabled = true

codegen.compile.dir = /tmp/sqoop-root/compile/054b4b106ff93dfa99cc8994f47e1524

direct.import = false

temporary.dirRoot = _sqoop

hive.fail.table.exists = false

db.batch = false执行job sqoop job -exec myjob重写参数 sqoop job -exec myjob -username someuser -P感悟解决不了的问题,可以先跳过,说明了现在的能力还不够解决

2019年10月21日18:14:27

回首这句话,后来学习了Datax,确实发现是数据类型的问题,解决了sqoop创建job的问题,并且也爱上了Datax

收藏

淘帖0

2

踩0

回复

只看该作者

4 个回复

倒序浏览

夜听雨

高级黑马

2019-10-22 10:29:37

|

只看该作者

沙发

回帖奖励 +2

学习要加,骄傲要减,机会要乘,懒惰要除。诶,激励自己的名言,希望对你有用

回复

使用道具

举报

1

0

西早boy

中级黑马

2019-10-22 21:00:09

|

只看该作者

藤椅

夜听雨 发表于 2019-10-22 10:29

学习要加,骄傲要减,机会要乘,懒惰要除。诶,激励自己的名言,希望对你有用 ...

首先非常感激您的提醒和叮嘱!

在经历过再次学习带来的担忧与惶恐后。在身体逐渐恢复的过程中,会有一些让人感觉奇特的行为(上课途中离场,突然旷课),我承认这些是不好的习惯,但是!请放心,我的目标明确,我的步伐坚定并且稳重,我的目光明亮并且清晰,或许我不是最优秀,不过还是会热充于帮助同学,如果是发现了解决不了的问题那也是发现并修补自己短板的过程。

欢迎您见证我成长的过程:

https://www.qinpeizhou.online

回复

使用道具

举报

西早boy

中级黑马

2019-10-22 21:07:54

|

只看该作者

板凳

夜听雨 发表于 2019-10-22 10:29

学习要加,骄傲要减,机会要乘,懒惰要除。诶,激励自己的名言,希望对你有用 ...

非常感激您的叮嘱与提醒!我由于身体还在恢复过程中,所以难免会有一些异于常人的行为。但是!请放心,我的心如磐石并且坚强,目光清晰而坚定,步伐沉稳并且踏实,

所以我还是希望您可以见证我的成长:

我的学习笔记博客: https://www.qinpeizhou.online

回复

使用道具

举报

柠檬leung不酸

黑马粉丝团

2019-11-13 10:22:57

|

只看该作者

报纸

“有人说...学习是为了完善人生,而非享乐人生,追求卓越,成功就会在不经意间追上你……”

回复

使用道具

举报

夜听雨

高级黑马

2019-11-21 09:33:39

|

只看该作者

地板

面对生活的选择,可怕的不是“大环境”,而是我们自己。勇于走出自己的舒适圈,勇于挑战自己,加油……

回复

使用道具

举报

新人专属

来到黑马社区就是交流学习哒~邮箱验证之后,记得先领个新人专属章吧~

QQ达人

使用QQ帐号登录论坛的用户

高级模式

B

Color

Image

Link

Quote

Code

Smilies

您需要登录后才可以回帖 登录 | 加入黑马

手机登录

发表回复