Update Delete in Hive
HIVE:

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

In hive Update and Delete is not done easily, it has some limitations.  Until Hive 0.13, hive does not support full ACID semantics. Until Hive 0.13, atomicity, consistency, and durability were provided at the partition level. Isolation could be provided by turning on one of the available locking mechanisms (ZooKeeper or in memory). With the addition of transactions in Hive 0.13 it is now possible to provide full ACID semantics at the row level, so that one application can add rows while another reads from the same partition without interfering with each other.

ACID semantics:

ACID stands for four traits of database transactions:

             1.Atomicity (an operation either succeeds completely or fails, it does not leave partial data),
            2.Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent                 operation),
             3.Isolation (operations by one user do not cause unexpected side effects for other users),
             4.Durability (once an operation is complete it will be preserved even in the face of machine or system failure).
 
Update and Delete On Hive:

In Hive Update and Delete work based on some limitations

             1.It can only be performed on tables that support ACID.
           2.If a table is to be used in ACID writes (insert, update, delete) then the table property “transactional” must be set on                  that table.
             3.Only ORC file format is supported in this.
             4.Tables must be bucketed to make use of these features.


Step-1:
First set the Following

SET hive.support.concurrency=true;


SET hive.enforce.bucketing=true;

SET hive.exec.dynamic.partition.mode=nonstrict;


SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;


SET hive.compactor.initiator.on=true;


SET hive.compactor.worker.threads=1;


SET hive.support.concurrency=true;


SET hive.enforce.bucketing=true;

Step-2: Create table with bucketing and ORC file format

create table hive_dml


(emp_id int, first_name string, last_name string)


clustered by (emp_id) into 4 buckets


stored as orc


tblproperties (‘transactional’=’true’);

Step-3:Insert values into table

insert into hive_dml values


(1001501,’alaister’,’briito’),(1001502,’anifa’,’mohammed’),


(1001503,’piyush’,’manish’),


(1001504,’vijay’,’karthik’),


(1001505,’narasingh’,’rao’);

table creation and insertion in hive


Step-4:
View the table

select * from hive_dml;

select command in hive


Step-5:
Update the value

update hive_dml set last_name=’yadav’ where emp_id=1001505;


Step-6:
View the data after complete the update

select * from hive_dml;

update in hive


Step-7:
Delete the table

delete from hive_dml where emp_id=1001502;

delete operation in hive


Step-8:
View the data after complete the delete

select * from hive_dml;

select operation in hive


Step-9:
Describe the table after using delete command

describe table hive_dml;

describe command in hive