Sqoop同步数据

hadoop 每天定时同步数据库配置数据到hive中

sqoop安装教程

https://www.yiibai.com/sqoop/sqoop_installation.html

sqoop 同步mysql到hive

1.测试sqoop是否能使用:

sqoop help

2.测试sqoop是否能访问数据库,例如mysql:

# 如果成功,会打印出该数据库下的所有表
sqoop list-tables --connect jdbc:mysql://ip地址:3306/db --username 账户 --password 密码

3.编写sqoop脚本,同步数据

sqoop import --connect jdbc:mysql://mysql_host:mysql_port/mysql_db \
--username username \
--password password \
--query 'select t.sword_no as sword_id,t.sword_desc as sword_name from test_sword as t where $CONDITIONS order by t.sword_no' \
--target-dir /user/sqoop2/test_word \
--delete-target-dir \
--num-mappers 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--direct \
--fields-terminated-by '\t' \
--lines-terminated-by "\n" \
--hive-import  \
--hive-overwrite  \
--hive-table  test.test_word

sqoop import  导入命令  
connect      连接数据库  
username     数据库用户名  
password     数据库密码  
query        查询语句, where $CONDITIONS　必须要有  
target-dir   hdfs上的路径,注意该路径权限问题  
delete-target-dir hdfs上如果指定目录存在，则先删除掉  
num-mappers  启动mappers的数量  
compress     启用压缩  
compression-codec 指定Hadoop的codec方式（默认gzip）  
direct      如果是mysql,添加该参数,提升速度  
fields-terminated-by 分割符号  
lines-terminated-by "\n"  换行符号  
hive-import 　导入hive,插入数据到hive当中，使用hive的默认分隔符  
hive-overwrite 覆盖之前的  
create-hive-table 如果表不存在,则创建,若是存在则报错  
hive-table 　设置导入到hive当中的表名  

4.hive 导入的其他参数:

--hive-drop-import-delims  导入到hive时删除 \n, \r, and \01 
--hive-delims-replacement  导入到hive时用自定义的字符替换掉\n, \r, and \01 
--hive-partition-key   hive分区的key
--hive-partition-value hive分区的值
--map-column-hive     类型匹配，sql类型对应到hive类型

Derek

Sqoop同步数据

sqoop安装教程

sqoop 同步mysql到hive

Search

Table of Contents