Spark wordcount program
Spark wordcount program
Spark
Apache Spark is a lightning-fast cluster computing technology, designed for fast computation.Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs.Spark lets you quickly write applications in Java, Scala, or Python. It comes with a built-in set of over 80 high-level operators. And you can use it interactively to query data within the shell.The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.
Scala
Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages
Wordcount program
The word count is the number of words in a document or passage of text. Word counting may be needed when a text is required to stay within certain numbers of words.
Number of partitions
Transformations
FlatMap
Return a new FlatMappedRDD by first applying a function to all elements and then flattening the results.
MAP:
Return a MappedRDD by applying function to each element
ReduceByKey()
Return a new RDD by aggregating values using reduceByKey(_+_)
toDebugString
Actions
Collects number of records