Update Delete in Hive

HIVE:

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

In hive Update and Delete is not done easily, it has some limitations.  Until Hive 0.13, hive does not support full ACID semantics. Until Hive 0.13, atomicity, consistency, and durability were provided at the partition level. Isolation could be provided by turning on one of the available locking mechanisms (ZooKeeper or in memory). With the addition of transactions in Hive 0.13 it is now possible to provide full ACID semantics at the row level, so that one application can add rows while another reads from the same partition without interfering with each other.

 

ACID semantics:

ACID stands for four traits of database transactions:

             1.Atomicity (an operation either succeeds completely or fails, it does not leave partial data),

            2.Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent                 operation),

             3.Isolation (operations by one user do not cause unexpected side effects for other users),

             4.Durability (once an operation is complete it will be preserved even in the face of machine or system failure).
 
Update and Delete On Hive:

In Hive Update and Delete work based on some limitations

             1.It can only be performed on tables that support ACID.
           2.If a table is to be used in ACID writes (insert, update, delete) then the table property “transactional” must be set on                  that table.
             3.Only ORC file format is supported in this.
             4.Tables must be bucketed to make use of these features.


Step-1:
First set the Following

SET hive.support.concurrency=true;

 

SET hive.enforce.bucketing=true;

SET hive.exec.dynamic.partition.mode=nonstrict;

 

SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

 

SET hive.compactor.initiator.on=true;

 

SET hive.compactor.worker.threads=1;

 

SET hive.support.concurrency=true;

 

SET hive.enforce.bucketing=true;

Step-2: Create table with bucketing and ORC file format

create table hive_dml

 

(emp_id int, first_name string, last_name string)

 

clustered by (emp_id) into 4 buckets

 

stored as orc

 

tblproperties ('transactional'='true');

Step-3:Insert values into table

insert into hive_dml values

 

(1001501,'alaister','briito'),(1001502,'anifa','mohammed'),

 

(1001503,'piyush','manish'),

 

(1001504,'vijay','karthik'),

 

(1001505,'narasingh','rao');

table creation and insertion in hive

 


Step-4:
View the table

select * from hive_dml;

select command in hive

 


Step-5:
Update the value

update hive_dml set last_name='yadav' where emp_id=1001505;


Step-6:
View the data after complete the update

select * from hive_dml;

update in hive

 


Step-7:
Delete the table

delete from hive_dml where emp_id=1001502;

delete operation in hive


Step-8:
View the data after complete the delete

select * from hive_dml;

select operation in hive


Step-9:
Describe the table after using delete command

describe table hive_dml;

describe command in hive

7 Thoughts on “HIVE DML UPDATE DELETE OPERATIONS”

  • Hi,

    Its a nice article.

    By setting the above properties I am able to INSERT the values but UPDATE and DELETE operations are not allowed. I am using hive 1.1.0 in Cloudera 5.5.0. Solution to this would be appreciable, please.

  • does this work on partitioned tables ?

    Say, below is how the table is created:

    create table hive_dml
    (emp_id int, first_name string, last_name string)
    PARTITIONED BY (city STRING)
    clustered by (emp_id) into 4 buckets
    stored as orc
    tblproperties (‘transactional’=’true’);

    Can i do update/delete on this table?

    • Yes, you can do it.

      Update and Delete work on the following table also.

      create table hive_dml
      (emp_id int, first_name string, last_name string)
      PARTITIONED BY (city STRING)
      clustered by (emp_id) into 4 buckets
      stored as orc
      tblproperties (‘transactional’=’true’);

Leave a Reply

Your email address will not be published. Required fields are marked *