HIVE DML UPDATE DELETE OPERATIONS
Update Delete in Hive
HIVE:
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.
In hive Update and Delete is not done easily, it has some limitations. Until Hive 0.13, hive does not support full ACID semantics. Until Hive 0.13, atomicity, consistency, and durability were provided at the partition level. Isolation could be provided by turning on one of the available locking mechanisms (ZooKeeper or in memory). With the addition of transactions in Hive 0.13 it is now possible to provide full ACID semantics at the row level, so that one application can add rows while another reads from the same partition without interfering with each other.
ACID semantics:
ACID stands for four traits of database transactions:
1.Atomicity (an operation either succeeds completely or fails, it does not leave partial data),
2.Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation),
3.Isolation (operations by one user do not cause unexpected side effects for other users),
4.Durability (once an operation is complete it will be preserved even in the face of machine or system failure).
Update and Delete On Hive:
In Hive Update and Delete work based on some limitations
1.It can only be performed on tables that support ACID.
2.If a table is to be used in ACID writes (insert, update, delete) then the table property “transactional” must be set on that table.
3.Only ORC file format is supported in this.
4.Tables must be bucketed to make use of these features.
Step-1: First set the Following
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
Step-2: Create table with bucketing and ORC file format
create table hive_dml
(emp_id int, first_name string, last_name string)
clustered by (emp_id) into 4 buckets
stored as orc
tblproperties (‘transactional’=’true’);
Step-3:Insert values into table
insert into hive_dml values
(1001501,’alaister’,’briito’),(1001502,’anifa’,’mohammed’),
(1001503,’piyush’,’manish’),
(1001504,’vijay’,’karthik’),
(1001505,’narasingh’,’rao’);
Step-4: View the table
select * from hive_dml;
Step-5:Update the value
update hive_dml set last_name=’yadav’ where emp_id=1001505;
Step-6: View the data after complete the update
select * from hive_dml;
Step-7: Delete the table
delete from hive_dml where emp_id=1001502;
Step-8: View the data after complete the delete
select * from hive_dml;
Step-9: Describe the table after using delete command
describe table hive_dml;