Spark Dataframe Operation : Reading data from different file formats

//Reading file without options

val peopledf = spark.read.csv(“file:///home/geoinsys/spark-2.4.5-bin-hadoop2.7/examples/src/main/resources/people.csv”)

//To view the schema of the dataframe

peopledf.printSchema

//Reading file with delimiter

val peopledfdelimited = spark.read.option(“sep”,”,”).csv(“file:///home/geoinsys/spark-2.4.5-bin-hadoop2.7/examples/src/main/resources/people.csv”)

//reading csv file with header,schema,delimiter

val dfschema = spark.read.option(“header”,”true”).option(“inferSchema”,”true”).option(“sep”,”,”).csv(“file:///home/geoinsys/spark-2.4.5-bin-hadoop2.7/examples/src/main/resources/people.csv”)

—————————————————-Handling superstore.csv ————————–

//Download the superstore.csv file from Datasets drive folder

val salesDF=spark.read.option(“header”,”true”).option(“inferSchema”,”true”).option(“sep”,”,”).csv(“file:///home/geoinsys/Downloads/datasets/superstore.csv”)

//To view the Schema

salesDF.printSchema

//To print the data in console using show ,it displays only 20 rows

salesDF.show

//To View the count

salesDF.count

//To know the number of partitions

salesDF.rdd.getNumPartitions

————-Basic operations on the dataframe —————

//To register SalesDF as register Temporary table

salesDF.registerTempTable(“Sales”)

//spark sql to process the sales temp table

spark.sql(“select Country,Category,sum(Profit) from Sales group by Country,Category).show

//cast string to Long

val totsales = salesDF.selectExpr(“Region”,”Country”,”State”,”Category”,”cast(Profit as Long)as Profit”)

//Only to view US sales data

val sales_US =totsales.filter(“Country=’United States'”).orderBy(“Country”).groupBy(“Country”,”Category”).agg(sum(“Profit”)as “Sales”)

sales_US.show

//reading data from json file

val dfjson = spark.read.option(“InferSchema”,”true”).json(“file:///home/geoinsys/spark-2.4.5-bin-hadoop2.7/examples/src/main/resources/people.json”)

//reading data from json file (nested)

sample file –> nested. json => {“col1”:{“col2″:”val2″,”col3”:[“arr1″,”arr2”]}}

jsondf=spark.read.json(“nested.json”)

json.select(“col1.col2”).show

jsondf.registerTempTable(“jsonnested”)

spark.sql(“select col1.col3[0] from jsonnested”).show

//reading data from Avro file

val avrodf = spark.read.format(“avro”).load(“people.avro”)

//reading avro using avro schema file

val schemaAvro = new Schema.Parser()

.parse(new File(“people.avsc”))

val peopledf = spark.read

.format(“avro”)

.option(“avroSchema”, schemaAvro.toString)

.load(“people.avro”)

//To save the file in hdfs

peopledf.write.csv(“hdfs://localhost:9000/user/geoinsys/peopleavro”)

//To save the dataframe as table

peopledf.write.saveAsTable(“People”)

——————————————————————————————————————–

best full stack big data training center in chennai,best hadoop training centre in chennai,best big data training in chennai,best training institute in chennai for big data,big data analytics training center in chennai,big data architect training in chennai,big data certification cost chennai,hadoop architect training in chennai,best bigdata corporate training for singapore , Australia , US ,big data classroom training in chennai,big data testing training in chennai,big data hadoop certification training and placement in chennai,big data cloudera training in chennai,big data mapr training in chennai,big data hortonworks training in chennai,big data hadoop training in chennai ekkaduthangal,big data hadoop training institutes in chennai,big data testing training in chennai,big data training and placement in chennai,big data corporate training center chennai,big data hadoop corporate training chennai ,big data workshop for students in chennai,big data training fees in chennai,free big data training in chennai,big data microsoft hdinsight training in chennai ekkaduthangal,big data training in chennai review,big data training in chennai tambaram,big data training in chennai velachery,big data training in chennai with placement,big data online training institute chennai
big data online training ekkaduthangal chennai,cost of big data online training in chennai,hadoop big data online training cost in chennai,ibm big insight big data online training in chennai,ekkaduthangal big data online training in chennai,training for big data in chennai,training on big data in chennai,Apache spark training,cloudera certification training ,data science bigdata training,data science using python ,statistics training in chennai,bigdata spark training in chennai,cloudera spark hadoop certification training,Hortonworks developer and admin training,Azure big data lake training,cloudera hadoop installation in azure ,Hortonworks hadoop installation in azure ,Mapr hadoop installation in azure,Mapr hadoop installation in AWS,Talend bigdata training in chennai,cassandra solr training in chennai,big data nosql training in ekkaduthangal,best big data machine learning training in chennai,best big data deep learning training in chennai,best big data online training in chennai with 100 % placement assistance,Bigdata job for fresher,RPA training in chennai,Mapr cluster installation and certification training in chennai,Informatica big data online training in chennai,hadoop spark nosql cloud training in chennai ,spark scala python programming training in chennai,Tensorflow training in chennai,pyspark training,hadoop job,bigdata job oriented training
placement assured online training for fresher,Apache spark with kafka training,confluent kafka corporate ,kstream training,ksql training,confluent kafka online training,spark structured training online,virtual class training, cheap and best spark training, spark mllib training in chennai, spark mllib training,kafka with spark training, apache kafka training, spark kafka cassandra corporate online training, AWS Bigdata EMR spark deployment, Spark interview questions, spark graph training in chennai,free spark training in chennai, spark cloud implementation

However, there may be other conditions in your contract that need to be taken into account. Canceling a Sanitas subscription is possible in several ways. At Xpendy we offer to do so through a certified letter. viagra new zeland The reason for opting for a certified letter is that this brings certainty.

De duur van koken, het werk van het deeg evenals de samenstelling van deze moeten zeer nauwkeurig zijn teneinde een product van optimale kleur, textuur en smaak te verkrijgen. Eens het deeg op goede temperatuur en met de adequate viscositeit komt, wordt het tussen cilinders voorbijgegaan die het de verlangde vorm geven: ronden, vierkanten of rechthoeken. best online casino De nougatine plaatjes kunnen dan als krokant bodem dienen voor alle soorten pralines.

Latest News

Spark Dataframe Operation : Reading data from different file formats

Recent Posts

Categories