Spark wordcount program

Spark

Apache Spark is a lightning-fast cluster computing technology, designed for fast computation.Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs.Spark lets you quickly write applications in Java, Scala, or Python. It comes with a built-in set of over 80 high-level operators. And you can use it interactively to query data within the shell.The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.

Scala

Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages

Wordcount program

The word count is the number of words in a document or passage of text. Word counting may be needed when a text is required to stay within certain numbers of words.

wordcount

Number of partitions

w2

Transformations

FlatMap

Return a new FlatMappedRDD by first applying a function to all elements and then flattening the results.w3

MAP:

Return a MappedRDD by applying function to each element

w4

ReduceByKey()

Return a new RDD by aggregating values using reduceByKey(_+_)

w5

toDebugString

w6

Actions

Collects number of records

Spark wordcount program

Leave a Reply

Your email address will not be published. Required fields are marked *