Spark DataFrame using Hive table

Spark DataFrame using Hive table
A DataFrame is a distributed collection of data, which is organized into named columns. Conceptually, it is equivalent to relational tables with good optimization techniques. DataFrame can be constructed from an array of different sources such as Hive tables, Structured Data files, External databases, or existing RDDs
Introduced in Spark1.3
DataFrame = RDD+schema
DataFrame provides a domain-specific language for structured data manipulation.
Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies
Hive comes bundled with the Spark library as HiveContext, which inherits from SQLContext. Using HiveContext, you can create and find tables in the HiveMetaStore and write queries on it using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext.
CREATE A TABLE IN HIVE

Insert records into the table
Retriving records from table:
Spark DataFrame using Hive table
Start the spark-shell:
$ spark-shell
Create SQLContext.
SparkSQL is a class and is used for initializing the functionalities of Spark SQL. SparkContext class object (sc) is required for initializing HiveContext class object
Create a HiveContext

write a query to create a dataframe in spark to read a data stored in a Hive table

To see the result of DataFrame hiveDf

Latest News

Spark DataFrame using Hive table

Recent Posts

Categories

Latest News

Spark DataFrame using Hive table

Related News

HIVE JSON

Recent Posts

Categories