什么是Hive的分区

分区意义

hive分区的意义是避免全表扫描，从而提高查询效率。默认使用全表扫描。

分区技术

[PARTITIONED BY (COLUMNNAME COLUMNTYPE [COMMENT 'COLUMN COMMENT'],...)]

1、hive的分区名区分大小写
2、hive的分区字段是一个伪字段，但是可以用来进行操作
3、一张表可以有一个或者多个分区，并且分区下面也可以有一个或者多个分区。
4、分区字段使用表外字段

分区方法和本质

分区的方式：使用日期、地域等方式将数据分散开
分区的本质：在表的目录或者是分区的目录下再创建目录，分区的目录名为指定字段=值(比如:dt=2019-09-09)

创建一级分区表

create table if not exists part1(
id int,
name string
)
partitioned by (dt string) row format delimited fields terminated by ' ';

加载数据

load data local inpath '/home/hivedata/t1' overwrite into  table part1 partition(dt='2019-09-09');
load data local inpath '/hivedata/user.txt' into table part1 partition(dt='2018-03-20');

查询语句

select * from part1 where dt='2018-03-20'

创建二级分区表

create table if not exists part2(
id int,
name string
)
partitioned by (year int,month int) row format delimited fields terminated by ' ';

加载数据

load data local inpath '/home/hivedata/t1' overwrite into  table part2 partition(year=2019,month=9);
load data local inpath '/home/hivedata/t' overwrite into  table part2 partition(year=2019,month=10);

查询语句

select * from part2 where year=2019 and month=10;

如何修改Hive的分区

查看分区

show partitions 表名;

添加分区

alter table part1 add partition(dt='2019-09-10');
alter table part1 add partition(dt='2019-09-13') partition(dt='2019-09-12');
alter table part1 add partition(dt='2019-09-11') location  '/user/hive/warehouse/qf1704.db/part1/dt=2019-09-10';

分区名称修改

alter table part1 partition(dt='2019-09-10') rename to partition(dt='2019-09-14');

修改分区路径

--错误使用
alter table part1 partition(dt='2019-09-14') set location '/user/hive/warehouse/qf24.db/part1/dt=2019-09-09';    
--正确使用，决对路径
alter table part1 partition(dt='2019-09-14') set location 'hdfs://hadoo01:9000/user/hive/warehouse/qf24.db/part1/dt=2019-09-09';

删除分区

alter table part1 drop partition(dt='2019-09-14');
alter table part1 drop partition(dt='2019-09-12'),partition(dt='2019-09-13');

分区类别

静态分区：加载数据到指定分区的值。
动态分区：数据未知，根据分区的值来确定需要创建的分区。
混合分区：静态和动态都有。

set hive.exec.dynamic.partition=true
set hive.exec.dynamic.partition.mode=strict/nonstrict
set hive.exec.max.dynamic.partitions=1000
set hive.exec.max.dynamic.partitions.pernode=100

strict:严格模式必须至少一个静态分区
nostrict：可以所有的都为动态分区，但是建议尽量评估动态分区的数量。

使用案例：

create table dy_part1(
id int,
name string
)
partitioned by (dt string)
row format delimited fields terminated by ' '
;
 
load data local inpath '/home/hivedata/t1' overwrite into  table dy_part1 partition(dt='2019-09-09');
 
set hive.exec.mode.local.auto=true;
insert into table dy_part1 partition(dt)
select
id,
name,
dt
from part1
;
 
混合分区：
create table if not exists dy_part2(
id int,
name string
)
partitioned by (year int,month int)
row format delimited fields terminated by ' '
;
 
set hive.exec.mode.local.auto=true;
set hive.exec.dynamic.partition.mode=strict;
insert into table dy_part2 partition(year=2019,month)
select
id,
name,
month
from part2
where year=2019
;

hive的严格模式

 <property>
    <name>hive.mapred.mode</name>
    <value>nonstrict</value>
    <description>
      The mode in which the Hive operations are being performed.
      In strict mode, some risky queries are not allowed to run. They include:
        Cartesian Product.
        No partition being picked up for a query.
        Comparing bigints and strings.
        Comparing bigints and doubles.
        Orderby without limit.
    </description>
  </property>

笛卡尔积

set hive.mapred.mode=strict;
select
*
from dy_part1 d1
join dy_part2 d2
;

分区表没有分区字段过滤

set hive.mapred.mode=strict;
select
*
from dy_part1 d1
where d1.dt='2019-09-09'
;
 
不行
select
*
from dy_part1 d1
where d1.id > 2
;

select
*
from dy_part2 d2
where d2.year >= 2019
;

order by不带limit查询

select
*
from log3
order by id desc
;

bigint和string比较

(bigint和string比较)Comparing bigints and strings.

bigint和double比较

(bigint和double比较)Comparing bigints and doubles.

hive读写模式：

Hive是一个严格的读时模式。写数据不管数据正确性，读的时候，不对则用NULL替代。
mysql是一个的写时模式。写的时候检查语法，不okay就会报错。

load data local inpath '/home/hivedata/t' into  table t_user;
insert into stu(id,sex) value(1,abc);

文章由极客之音整理，本文链接：https://www.bmabk.com/index.php/post/72635.html

Hive学习笔记（一）：Hive分区修改

文章目录

什么是Hive的分区

分区意义

分区技术

分区方法和本质

创建一级分区表

创建二级分区表

如何修改Hive的分区

查看分区

添加分区

分区名称修改

修改分区路径

删除分区

分区类别

hive的严格模式

笛卡尔积

分区表没有分区字段过滤

order by不带limit查询

bigint和string比较

bigint和double比较

hive读写模式：

Hive学习笔记（一）：Hive分区修改

文章目录

什么是Hive的分区

分区意义

分区技术

分区方法和本质

创建一级分区表

创建二级分区表

如何修改Hive的分区

查看分区

添加分区

分区名称修改

修改分区路径

删除分区

分区类别

hive的严格模式

笛卡尔积

分区表没有分区字段过滤

order by不带limit查询

bigint和string比较

bigint和double比较

hive读写模式：

相关推荐

分享到: