1. What does the term "Big Data" refer to?
A) Data that is unstructured only
B) Data that is stored in relational databases
C) Large and complex data sets that require advanced data processing
D) Data stored in a cloud server
Show Explanation
2. In data mining, what does 'classification' mean?
A) Categorizing data into predefined classes
B) Finding the frequency of occurrences
C) Reducing data to essential elements
D) Summing all data values
Show Explanation
3. Which 'V' of Big Data refers to the diverse types and forms of data?
A) Volume
B) Variety
C) Velocity
D) Value
Show Explanation
4. What is the purpose of 'clustering' in data mining?
A) To categorize data based on predefined classes
B) To identify trends in data
C) To group similar data items
D) To eliminate outliers
Show Explanation
5. Which of the following frameworks is commonly used in Big Data for distributed storage and processing?
A) Apache Hadoop
B) SQL Server
C) MySQL
D) Oracle
Show Explanation
6. Which 'V' in Big Data represents the speed at which data is processed?
A) Volume
B) Variety
C) Velocity
D) Veracity
Show Explanation
7. What is 'data wrangling' in data mining?
A) Analyzing the data set
B) Cleaning and transforming raw data
C) Collecting data from various sources
D) Visualizing data with charts
Show Explanation
8. Which data mining technique is commonly used for classification purposes?
A) Decision tree
B) K-means clustering
C) Data wrangling
D) Association rule mining
Show Explanation
9. Which Big Data technology is widely used for distributed data processing and is known for its speed?
A) Apache Hadoop
B) PostgreSQL
C) Apache Spark
D) Tableau
Show Explanation
10. What is 'predictive analytics' in data mining?
A) Analyzing data after it is collected
B) Sorting data into categories
C) Cleaning and transforming data
D) Using data to predict future outcomes
Show Explanation
11. Which technique in data mining is used to identify data points that significantly deviate from others in a data set?
A) Clustering
B) Outlier detection
C) Classification
D) Association
Show Explanation
12. Which programming model is commonly associated with processing large data sets in Big Data?
A) Decision Tree
B) K-means
C) MapReduce
D) SQL
Show Explanation
13. What is 'association rule mining' used for in data mining?
A) Finding relationships between variables
B) Reducing data dimensionality
C) Data preprocessing
D) Data collection
Show Explanation
14. Which Big Data technology is primarily used as a data warehouse for managing and querying large datasets?
A) HBase
B) Apache Hive
C) Apache Storm
D) NoSQL
Show Explanation
15. What is the primary benefit of using Big Data analytics for organizations?
A) Lowering data storage costs
B) Eliminating data silos
C) Providing high-speed internet access
D) Gaining insights to enhance decision-making
Show Explanation
16. Which algorithm is widely used in clustering to group data points based on feature similarity?
A) Decision Tree
B) Naïve Bayes
C) K-means
D) Linear Regression
Show Explanation
17. Which 'V' in Big Data refers to the accuracy, trustworthiness, and reliability of data?
A) Volume
B) Veracity
C) Velocity
D) Variety
Show Explanation
18. Which data mining technique is used to identify patterns in data that do not match the expected behavior?
A) Anomaly detection
B) Association rule mining
C) Clustering
D) Classification
Show Explanation
19. Which type of database is preferred in Big Data environments for handling unstructured data?
A) Relational Database
B) SQL Database
C) Data Warehouse
D) NoSQL Database
Show Explanation
20. In the Hadoop ecosystem, which component is responsible for the storage of large datasets across distributed servers?
A) MapReduce
B) YARN
C) HDFS
D) Pig
Show Explanation