Spark Fundamentals I Cognitive class Exam Answers:-
Course Name :- Spark Fundamentals I
Module 1 :- Introduction to Spark
Question 1 : What gives Spark its speed advantage for complex applications?
  - Spark can cover a wide range of workloads under one system
 
  - Various libraries provide Spark with additional functionality
 
  - Spark extends the MapReduce model
 
  - Spark makes extensive use of in-memory computations
 
  - All of the above
 
Question 2 : For what purpose would an Engineer use Spark? Select all that apply.
  - Analyzing data to obtain insights
 
  - Programming with Spark’s API
 
  - Transforming data into a useable form for analysis
 
  - Developing a data processing system
 
  - Tuning an application for a business use case
 
Question 3 : Which of the following statements are true of the Resilient Distributed Dataset (RDD)? Select all that apply.
  - There are three types of RDD operations.
 
  - RDDs allow Spark to reconstruct transformations
 
  - RDDs only add a small amount of code due to tight integration
 
  - RDD action operations do not return a value
 
  - RDD is a distributed collection of elements parallelized across the cluster.
 
Module 2 :- Resilient Distributed Dataset and Dataframes 
Question 1 : Which of the following methods can be used to create a Resilient Distributed Dataset (RDD)? Select all that apply.
  - Creating a directed acyclic graph (DAG)
 
  - Parallelizing an existing Spark collection
 
  - Referencing a Hadoop-supported dataset
 
  - Using data that resides in Spark
 
  - Transforming an existing RDD to form a new one
 
Question 2 : What happens when an action is executed?
  - The driver sends code to be executed on each block
 
  - Executors prepare the data for operation in parallel
 
  - A cache is created for storing partial results in memory
 
  - Data is partitioned into different blocks across the cluster
 
  - All of the above
 
Question 3 : Which of the following statements is true of RDD persistence? Select all that apply.
  - Persistence through caching provides fault tolerance
 
  - Future actions can be performed significantly faster
 
  - Each partition is replicated on two cluster nodes
 
  - RDD persistence always improves space efficiency
 
  - By default, objects that are too big for memory are stored on the disk
 
Module 3 :- Spark Application Programming
Question 1 : What is SparkContext?
  - A tool for linking to nodes
 
  - A tool that provides fault tolerance
 
  - A programming language for applications
 
  - The built-in shell for the Spark engine
 
  - An object that represents the connection to a Spark cluster
 
Question 2 : Which of the following methods can be used to pass functions to Spark? Select all that apply.
Question 3 : Which of the following is a main component of a Spark application’s source code?
Module 4 :- Introduction to the Spark Libraries
Question 1 : Which of the following is NOT an example of a Spark library?
Question 2 : From which of the following sources can Spark Streaming receive data? Select all that apply.
Question 3 : In Spark Streaming, processing begins immediately when an element of the application is executed. True or false?
Module 5 :- Spark Configuration , Monitoring and Turning
Question 1 : hich of the following is a main component of a Spark cluster? Select all that apply.
Question 2 : What are the main locations for Spark configuration? Select all that apply.
  - The SparkConf object
 
  - The Spark Shell
 
  - Executor Processes
 
  - Environment variables
 
  - Logging properties
 
Question 3 : Which of the following techniques can improve Spark performance? Select all that apply.
  - Scheduler Configuration
 
  - Memory Tuning
 
  - Data Serialization
 
  - Using Broadcast variables
 
  - Using nested structures
 
Spark Fundamentals I Cognitive class Final Exam Answers:-
Question 1 : Which of the following is a type of Spark RDD operation? Select all that apply.
Question 2 : Spark must be installed and run on top of a Hadoop cluster. True or false
Question 3 : following operations will work improperly when using a Combiner?
  - Average
 
  - Maximum
 
  - Minimum
 
  - Count
 
  - All of the above operations will work properly
 
Question 4 : Spark supports which of the following libraries?
Question 5 : Spark supports which of the following programming languages?
  - Scala, Perl, Java
 
  - Scala, Java, C++, Python, Perl
 
  - Scala, Python, Java, R
 
  - Java and Scala
 
  - C++ and Python
 
Question 6 : A transformation is evaluated immediately. True or false?
Question 7 : Which storage level does the cache() function use?
  - MEMORY_ONLY
 
  - MEMORY_ONLY_SER
 
  - MEMORY_AND_DISK
 
  - MEMORY_AND_DISK_SER
 
Question 8 : Which of the following statements does NOT describe accumulators?
  - They can only be added through an associative operation
 
  - Programmers can extend them beyond numeric types
 
  - They can only be read by the driver
 
  - They are read-only
 
  - They implement counters and sums
 
Question 9 : You must explicitly initialize the SparkContext when creating a Spark application. True or false?
Question 10 : The “local” parameter can be used to specify the number of cores to use for the application. True or false?
Question 11 : Spark applications can ONLY be packaged using one, specific build tool. True or false?
Question 12 : Which of the following parameters of the “spark-submit” script determine where the application will run?
Question 13 : Which of the following is NOT supported as a cluster manager?
Question 14 : Spark SQL allows relational queries to be expressed in which of the following?
  - Scala, SQL, and HiveQL
 
  - Scala and HiveQL
 
  - Scala and SQL
 
  - SQL only
 
  - HiveQL only
 
Question 15: Spark Streaming processes live streaming data in real-time. True or false?
Question 16 : The MLlib library contains which of the following algorithms?
  - Classification
 
  - Regression
 
  - Clustering
 
  - Dimensionality Reduction
 
  - All of the above
 
Question 17 : What is the purpose of the GraphX library?
  - To create a visual representation of the data
 
  - To generate data-parallel models
 
  - To create a visual representation of a directed acyclic graph (DAG)
 
  - To perform graph-parallel computations
 
  - To convert from data-parallel to graph-parallel algorithms
 
Question 18 : Which list describes the correct order of precedence for Spark configuration, from highest to lowest?
  - Flags passed to spark-submit, values in spark-defaults.conf, properties set on SparkConf
 
  - Properties set on SparkConf, values in spark-defaults.conf, flags passed to spark-submit
 
  - Values in spark-defaults.conf, properties set on SparkConf, flags passed to spark-submit
 
  - Properties set on SparkConf, flags passed to spark-submit, values in spark-defaults.conf
 
  - Values in spark-defaults.conf, flags passed to spark-submit, properties set on SparkConf
 
Question 19 : Spark monitoring can be performed with external tools. True or false?
Question 20 : Which serialization libraries are supported in Spark? Select all that apply.