RDDs – Resilient Distributed Datasets: Iit is the fundamental unit of data in spark, which is didtributed collection of elements […]

Apache Spark: Apache Spark is a general-purpose & lightning fast cluster computing system. It provides a high-level API like Java, […]

RDD Joins in Core Spark Apache Spark Apache Spark is an open source parallel processing framework for running large-scale data […]

Spark sql Aggregate Function in RDD: Spark sql: Spark SQL is a Spark module for structured data processing. Unlike the […]

Pyspark Various Functions Pyspark:     PySpark is the python binding for the Spark Platform and API and not much different from […]

Apache Hive Dynamic Partition  table Difference between Static and Dynamic partition : Static Partition  columns: in DML/DDL involving multiple partitioning […]

Joins in Hive : Hive converts joins over multiple tables into a single map/reduce job if for every table the […]

Hadoop installation steps for a pseudo-distributed mode Pseudo-Distributed Installation Steps for setting up a pseudo-distributed Hadoop cluster backed by the […]