1. What is the primary characteristic of a classification algorithm in data mining?
A) It groups similar data points together
B) It predicts discrete categories for data
C) It creates continuous outputs
D) It is used for data cleaning
Show Explanation
2. Which of the following is a common data visualization technique used in data mining?
A) Predictive modeling
B) Data cleaning
C) Scatter plot
D) Clustering
Show Explanation
3. Which technique is used in data mining to reduce the number of features in a dataset?
A) Regression
B) Classification
C) Clustering
D) Dimensionality reduction
Show Explanation
4. What is the purpose of data cleaning in data mining?
A) To remove errors and inconsistencies in the data
B) To analyze the patterns in the data
C) To make the data fit into the model
D) To normalize the data
Show Explanation
5. Which technology is commonly used for real-time big data processing?
A) Hadoop MapReduce
B) Apache Hive
C) Apache Spark
D) Apache Pig
Show Explanation
6. What is the main use of predictive analytics in Big Data?
A) To visualize patterns
B) To forecast future trends and behaviors
C) To clean the data
D) To reduce dimensionality
Show Explanation
7. Which data mining technique is used for classification based on attributes or features?
A) Clustering
B) Association rules
C) Regression
D) Decision trees
Show Explanation
8. What does the 'velocity' characteristic of Big Data refer to?
A) The volume of data
B) The variety of data
C) The speed at which data is generated and processed
D) The quality of data
Show Explanation
9. Which type of learning is applied when the data does not have predefined labels or outcomes?
A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) Semi-supervised learning
Show Explanation
10. Which data mining technique is used to find relationships between variables?
A) Association rule mining
B) Regression analysis
C) Classification
D) Clustering
Show Explanation
11. Which algorithm is commonly used for unsupervised clustering in data mining?
A) Linear Regression
B) Decision Trees
C) K-means clustering
D) Naive Bayes
Show Explanation
12. What does 'variety' refer to in the context of big data?
A) Different types of data
B) The volume of data
C) The speed of data generation
D) The quality of data
Show Explanation
13. Which of the following Big Data technologies is designed for processing large datasets efficiently?
A) SQL databases
B) Relational databases
C) Microsoft Excel
D) Hadoop and Apache Spark
Show Explanation
14. In data mining, what does a decision tree represent?
A) A sequence of numeric predictions
B) A flowchart representing decisions based on attributes
C) A matrix for classification problems
D) A method for clustering
Show Explanation
15. Which of the following industries commonly applies big data analytics?
A) Only technology
B) Only finance
C) Healthcare, finance, retail, and transportation
D) None of the above
Show Explanation
16. What does the term 'overfitting' refer to in data mining?
A) A model that fits the training data too closely and performs poorly on new data
B) A model that ignores training data
C) A model that performs equally well on both training and test data
D) A model that simplifies the data too much
Show Explanation
17. Which type of database is commonly used for big data applications?
A) Relational database
B) Flat-file database
C) Hierarchical database
D) NoSQL database
Show Explanation
18. What is the purpose of a confusion matrix in data mining?
A) To visualize data relationships
B) To measure model bias
C) To evaluate classification model performance
D) To perform dimensionality reduction
Show Explanation
19. What is the purpose of feature scaling in data mining?
A) To reduce dataset size
B) To normalize the data for better performance in machine learning
C) To identify outliers in the data
D) To increase the dataset size
Show Explanation
20. What is the primary goal of text mining?
A) To derive meaningful patterns from text data
B) To summarize numerical data
C) To eliminate stop words from text
D) To convert text into numerical data
Show Explanation