Unit 6 - Practice Quiz

INT428 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is the primary function of a tool like Tableau?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Easy
A. To train complex neural networks from scratch
B. To create interactive data visualizations and dashboards
C. To store large amounts of unstructured data
D. To write and compile computer code

2 Which of the following is a classic example of structured data?

Working with structured and unstructured data Easy
A. Audio recordings from a call center
B. A folder of images from a security camera
C. A collection of customer review emails
D. A table of customer information in a SQL database

3 Which of these data types is considered unstructured?

Working with structured and unstructured data Easy
A. An Excel spreadsheet with employee IDs
B. A CSV file with sales figures
C. A database of student grades
D. Video files

4 What is the main purpose of a data pipeline?

Data pipelines and automation Easy
A. To exclusively visualize data
B. To move data from a source to a destination, often with transformations
C. To create AI models
D. To store data backups

5 What is a major advantage of using cloud services like AWS or Azure for training AI models?

AI Model Environments & Lifecycle Basics: Cloud services Easy
A. It does not require an internet connection
B. It guarantees the model will be 100% accurate
C. Access to powerful computing resources on-demand
D. It is always free to use

6 What does edge deployment for an AI model mean?

AI Model Environments & Lifecycle Basics: Edge deployment Easy
A. Running the model on a central cloud server
B. Running the model directly on a local device like a smartphone or sensor
C. Storing the model on a cutting-edge hard drive
D. Training the model on multiple computers simultaneously

7 What does the term MLOps primarily refer to?

Introduction to MLOps and lifecycle management Easy
A. A brand of computer hardware for AI
B. A programming language for statistics
C. A new type of machine learning algorithm
D. A set of practices for collaboration and communication between data scientists and IT professionals

8 In machine learning, what is overfitting?

Error identification Easy
A. When a model is too simple to capture the underlying data patterns
B. When the dataset is too small to use
C. When a model performs very well on training data but poorly on new data
D. When a model performs poorly on both training and new data

9 What is the primary goal of AI process automation?

AI process automation Easy
A. To create art and music using AI
B. To use AI to perform repetitive tasks previously done by humans
C. To replace all human jobs with robots
D. To analyze stock market trends exclusively

10 What is often considered the first step in troubleshooting a problem with an AI model?

Troubleshooting Easy
A. Adding more data to the training set
B. Immediately deleting the model and starting over
C. Identifying and understanding the specific problem or error
D. Changing the model's algorithm

11 When using a tool like ChatGPT's Advanced Data Analysis, what kind of input do you typically provide to start an analysis?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Easy
A. Complex Python code
B. A connection to a live-streaming database
C. A natural language prompt describing the task and a data file
D. A pre-trained neural network

12 What does automation in a data pipeline help to reduce?

Data pipelines and automation Easy
A. The amount of data being processed
B. The need for manual intervention and human error
C. The complexity of the data
D. The number of data sources

13 Which of these applications is a good candidate for edge deployment?

AI Model Environments & Lifecycle Basics: Edge deployment Easy
A. Analyzing a decade of a company's financial records
B. A massive climate change simulation model
C. A real-time object detection feature on a smartphone camera
D. Training a large language model like GPT-4

14 Which stage of the AI model lifecycle involves putting a trained model into a live environment to make predictions?

Introduction to MLOps and lifecycle management Easy
A. Model training
B. Feature engineering
C. Deployment
D. Data collection

15 The term for using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server, is called:

AI Model Environments & Lifecycle Basics: Cloud services Easy
A. Local Hosting
B. Cloud Computing
C. Personal Computing
D. Edge Computing

16 What is a syntax error in a computer program?

Error identification Easy
A. An error that occurs only when the program is out of memory
B. An error where the program runs but produces incorrect results
C. An error caused by a user providing invalid input
D. An error in the code that violates the rules of the programming language

17 In software, what does debugging refer to?

Troubleshooting Easy
A. The process of deploying the application to a server
B. The process of writing new features for an application
C. The process of designing the user interface
D. The process of finding and fixing errors or 'bugs' in code

18 What is the key characteristic of structured data?

Working with structured and unstructured data Easy
A. It is always stored in PDF files
B. It has a predefined format and a fixed schema
C. It has no internal structure
D. It can only be text

19 After an AI model is deployed, what is a critical MLOps practice to ensure it continues to perform well?

Introduction to MLOps and lifecycle management Easy
A. Deleting the training data
B. Hiding the model's predictions from users
C. Monitoring and maintenance
D. Never updating the model

20 A business wants to automatically categorize incoming customer support emails into 'Urgent', 'Billing Question', or 'General Inquiry'. This is an example of:

AI process automation Easy
A. Data visualization
B. Edge deployment
C. AI process automation
D. Hardware troubleshooting

21 A business analyst has a 500MB CSV file of sales data and wants to quickly explore potential correlations, generate summary statistics, and create a few initial plots without writing any code. Which tool would be most efficient for this initial exploratory data analysis task?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Medium
A. Tableau by connecting to the data source and manually dragging and dropping fields to create worksheets.
B. Importing the data into a SQL database and writing complex queries.
C. ChatGPT Advanced Data Analysis by uploading the file and using natural language prompts.
D. Writing a custom Python script using the Pandas and Matplotlib libraries.

22 An AI system is designed to analyze customer support tickets. Each ticket contains the customer's name (text), a priority level (low, medium, high), the date of submission (timestamp), and a free-text description of the problem. How should this data be categorized?

Working with structured and unstructured data Medium
A. Entirely structured data because it's all stored in a database.
B. Primarily time-series data due to the submission date.
C. Entirely unstructured data because it contains free-text.
D. A mix of structured (name, priority, date) and unstructured (problem description) data.

23 In an automated data pipeline for a machine learning model, what is the primary purpose of the 'Data Validation' stage that typically follows data ingestion?

Data pipelines and automation Medium
A. To train the machine learning model on the new data.
B. To store the raw data in a data lake or warehouse.
C. To convert raw data into features for the model (e.g., normalization).
D. To check if the incoming data meets certain quality and schema expectations before processing.

24 A hospital is developing an AI tool to assist surgeons by providing real-time analysis of a video feed from a laparoscopic camera during an operation. The system must have minimal latency (delay) to be effective. What is the most appropriate deployment environment for this AI model?

AI Model Environments & Lifecycle Basics: Cloud services, Edge deployment Medium
A. Batch processing on a local server after the surgery is complete.
B. Cloud deployment on a high-performance server in a remote data center.
C. Hybrid deployment where data is sent to the cloud for processing and results are sent back.
D. Edge deployment on the surgical equipment itself.

25 A machine learning model is trained to predict customer churn. During evaluation, the model's accuracy on the training dataset is 98%, but its accuracy on a new, unseen test dataset is only 60%. This significant performance gap is a classic sign of:

Error identification, Troubleshooting Medium
A. Overfitting
B. Class imbalance
C. Data leakage
D. Underfitting

26 What is the primary role of a 'feature store' in an MLOps framework?

Introduction to MLOps and lifecycle management Medium
A. To log the performance metrics of models in production.
B. To store the final, trained machine learning models.
C. To provide a centralized repository for storing, retrieving, and managing curated features for model training and serving.
D. To orchestrate the entire data pipeline from ingestion to deployment.

27 A company wants to automate the process of categorizing incoming customer support emails into 'Billing', 'Technical Issue', or 'General Inquiry' before they are assigned to an agent. Which AI technology is best suited for this task?

AI process automation Medium
A. Anomaly detection for finding outliers.
B. Robotic Process Automation (RPA) for mimicking UI clicks.
C. Computer Vision for image recognition.
D. Natural Language Processing (NLP) for text classification.

28 When preparing unstructured text data, such as movie reviews, for a sentiment analysis model, a common preprocessing step is 'vectorization'. What does this process accomplish?

Working with structured and unstructured data Medium
A. It corrects all spelling and grammar mistakes in the text.
B. It stores the text in a highly compressed format to save space.
C. It converts the text into a numerical representation (vectors) that a machine learning model can understand.
D. It summarizes the entire text into a single sentence.

29 A large e-commerce company trains its product recommendation model weekly on terabytes of new user interaction data. Why is a cloud environment better suited for this task than an on-premise server?

AI Model Environments & Lifecycle Basics: Cloud services, Edge deployment Medium
A. Cloud services offer elastic scalability, allowing the company to provision powerful computing resources (like many GPUs/TPUs) for the training period and then scale them down to save costs.
B. On-premise servers are incapable of handling terabytes of data.
C. Cloud services guarantee lower latency for model inference for all users globally.
D. Cloud environments are inherently more secure than any on-premise solution.

30 You are tasked with creating a highly interactive, public-facing dashboard that allows users to filter data by region, date range, and product category. The dashboard must be embeddable in a website and handle live data connections. Which tool is designed for this specific purpose?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Medium
A. A Jupyter Notebook with static plots
B. ChatGPT Advanced Data Analysis
C. Tableau
D. Microsoft Excel

31 Within the MLOps lifecycle, what is the primary purpose of 'model monitoring' after a model has been deployed?

Introduction to MLOps and lifecycle management Medium
A. To detect performance degradation, data drift, or concept drift in the production environment.
B. To A/B test different versions of the model's user interface.
C. To continuously retrain the model with new data every few seconds.
D. To keep a version-controlled history of the model's source code.

32 An AI model for predicting house prices is found to have high bias. What is the most likely symptom of this problem?

Error identification, Troubleshooting Medium
A. The model performs perfectly on the training data but fails miserably on the test data.
B. The model takes an excessively long time to train.
C. The model's predictions fluctuate wildly with small changes in the input data.
D. The model performs poorly on both the training data and the test data, consistently making large errors.

33 In the context of data pipelines, what is the concept of 'idempotency'?

Data pipelines and automation Medium
A. Running the pipeline multiple times with the same input will always produce the same output, without causing unintended side effects.
B. The pipeline runs on a predefined schedule, such as once every 24 hours.
C. The pipeline can process both structured and unstructured data simultaneously.
D. The pipeline automatically scales its resources based on the volume of data.

34 Which of the following scenarios is a better fit for traditional Robotic Process Automation (RPA) rather than a more complex AI-based automation solution?

AI process automation Medium
A. Reading a handwritten doctor's note and summarizing the key points.
B. Forecasting next quarter's sales based on historical data and market trends.
C. Copying data from a specific cell in an Excel sheet and pasting it into a fixed field in a web-based form.
D. Determining the overall sentiment (positive/negative) of a customer's email.

35 A data science team has two versions of a fraud detection model. They want to test which one performs better on live traffic without fully replacing the old model. They decide to route 10% of user requests to the new model and 90% to the old one. This deployment strategy is known as:

Introduction to MLOps and lifecycle management Medium
A. Shadow Deployment
B. Canary Deployment
C. Blue-Green Deployment
D. A/B Testing

36 An AI model is being built to extract information from scanned PDF invoices. What is a primary challenge that distinguishes this task from analyzing a simple text file?

Working with structured and unstructured data Medium
A. The model must understand the spatial layout and structure (e.g., tables, key-value pairs) of the document, not just the raw text.
B. The text in a PDF is always perfectly clean and requires no preprocessing.
C. PDF files cannot be read by programming languages.
D. PDF invoices contain only structured data, which is difficult to parse.

37 During the exploratory data analysis phase, you discover that the 'price' column in your dataset, which should be numerical, contains values like '$1,200.50' and '950 USD'. If you try to feed this data directly into a regression model, what type of error will most likely occur?

Error identification, Troubleshooting Medium
A. An overfitting error due to the high variance in price.
B. A logical error where the model produces negative price predictions.
C. A data leakage error from the currency symbols.
D. A data type error, as the model expects a numeric type but receives a string.

38 A key component of a robust, automated ML training pipeline is data and model versioning. Why is it crucial to version not just the code, but also the data used for training?

Data pipelines and automation Medium
A. To automatically encrypt the dataset for security.
B. To speed up the data loading process during training.
C. To reduce the storage space required for the dataset.
D. To ensure reproducibility, allowing you to recreate a specific model by using the exact same code and data it was trained on.

39 What is a major drawback of edge deployment compared to cloud deployment for AI models?

AI Model Environments & Lifecycle Basics: Cloud services, Edge deployment Medium
A. Higher network latency and dependence on internet connectivity.
B. Higher costs associated with paying for on-demand cloud computing resources.
C. Difficulty in scaling to serve millions of users simultaneously.
D. Limited computational power, memory, and energy on edge devices, making it difficult to run large, complex models.

40 A user provides Tableau with a dataset containing 'State', 'City', and 'Sales' data. Tableau automatically recognizes that 'State' and 'City' are geographical data types and suggests plotting them on a map. This feature is an example of:

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Medium
A. AI-powered predictive modeling.
B. Automated data type inference and semantic recognition.
C. Natural Language Processing (NLP) of the column headers.
D. A manually programmed rule for all columns named 'State' or 'City'.

41 An MLOps team is managing a credit risk model for a bank. They detect that the model's predictions are systematically drifting for a specific demographic group, indicating potential fairness issues. The model is retrained automatically every month on new data. What is the most robust MLOps strategy to address this specific type of concept drift?

Introduction to MLOps and lifecycle management Hard
A. Roll back to the previous model version and halt the automatic retraining pipeline until the data distribution stabilizes.
B. Increase the retraining frequency to weekly to adapt to the new data distribution more quickly.
C. Trigger an alert for manual review by the data science team whenever the model's accuracy drops below a predefined threshold.
D. Implement stratified retraining batches that ensure consistent demographic representation in every training cycle and add a fairness constraint to the model's loss function.

42 A company is deploying a real-time object detection model on a fleet of 10,000 battery-powered drones. The key constraints are inference latency (< 50ms) and power consumption. The original model is a large TensorFlow FP32 model. Which model optimization strategy represents the most sophisticated and effective approach for this specific edge scenario?

AI Model Environments & Lifecycle Basics: Edge deployment Hard
A. Implement Quantization-Aware Training (QAT) to retrain the model, simulating INT8 quantization during training, and then deploy the resulting model.
B. Convert the model to TensorFlow Lite (FP32) and deploy it, relying on the hardware's GPU for acceleration.
C. Use post-training dynamic range quantization to convert weights to INT8, as it requires no representative dataset and is simple to implement.
D. Prune the model by 50% to reduce its size and then use post-training static quantization with a representative dataset of images from the drones.

43 A data engineer is designing a data pipeline that processes financial transactions. The pipeline has a critical step that aggregates transactions and writes the summary to a database. If the pipeline fails after this step and is re-run, it must not create duplicate summaries or incorrect aggregates. Which property is essential for this specific step?

Data pipelines and automation Hard
A. Scalability
B. Observability
C. Idempotency
D. Latency

44 During the training of a Generative Adversarial Network (GAN), the generator's loss drops to near zero while the discriminator's loss remains high and erratic. The generated images are all very similar and lack diversity. This phenomenon is best described as:

Error identification, Troubleshooting Hard
A. Exploding Gradients
B. Overfitting
C. Mode Collapse
D. Vanishing Gradients

45 An analyst uses ChatGPT's Advanced Data Analysis to analyze a sales dataset. They ask it to "Identify the top 3 product categories by profit margin and visualize the result." The AI generates a bar chart showing 3 categories. However, when the analyst manually calculates the profit margin using the formula , they find a different set of top 3 categories. What is the most likely cause of this discrepancy originating from the AI's process?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Hard
A. The AI's Python execution environment had a floating-point precision error that miscalculated the division for certain categories.
B. The dataset contained null or zero values in the 'Sales' column for some rows, and the AI's default data cleaning step dropped these rows, skewing the calculation.
C. The AI is non-deterministic and hallucinated the results without performing the actual calculation.
D. The AI misinterpreted "profit margin" and calculated "total profit" instead, as this is a more common and simpler metric.

46 A team is building a recommendation engine for an e-commerce site. They have structured data (user purchase history, product ratings) and unstructured data (text from product reviews). To create a hybrid model, they generate embeddings from the review text using a transformer model. What is the most significant challenge when combining these text-based embeddings with the structured user/item features?

Working with structured and unstructured data Hard
A. It is impossible to concatenate feature vectors of different data types (numerical and text-based) into a single input for a machine learning model.
B. Structured data cannot be normalized to the same scale as text embeddings, leading to training instability.
C. The high dimensionality of text embeddings (e.g., 768 dimensions for BERT) can dominate the lower-dimensional structured features, making the model insensitive to purchase history or ratings.
D. Transformer models for text embeddings are too slow for real-time recommendation systems and cannot be combined with structured data.

47 What is the primary motivation for using a dedicated Feature Store in a mature MLOps organization with multiple data science teams?

Introduction to MLOps and lifecycle management Hard
A. To prevent training-serving skew by ensuring the exact same feature engineering logic is used during both model training and real-time inference.
B. To provide a centralized location for data scientists to visualize and explore raw data before feature engineering.
C. To serve as a version control system for model weights and artifacts, similar to Git.
D. To automate the process of hyperparameter tuning for all models in the organization.

48 A startup is training a large language model on a custom dataset. They need to perform distributed training across multiple GPUs to reduce training time. They are considering different cloud services. Which of the following describes the most complex challenge they will face that is specific to distributed training in the cloud?

AI Model Environments & Lifecycle Basics: Cloud services Hard
A. Provisioning a single virtual machine with a sufficiently powerful GPU to handle the model's memory requirements.
B. Managing inter-node communication bandwidth and latency, which can become a bottleneck and diminish the returns of adding more nodes.
C. Installing the correct version of deep learning frameworks like PyTorch or TensorFlow on the cloud instances.
D. Setting up a secure network connection (VPC) to protect the training data from unauthorized access.

49 A company wants to automate its invoice processing. The process involves receiving invoices as PDFs via email, extracting fields like invoice number, date, and total amount, and then entering this data into an ERP system. The invoices come from hundreds of different vendors, each with a unique template. Why would a traditional OCR + template-based RPA solution be inferior to an AI-powered Intelligent Document Processing (IDP) solution for this task?

AI process automation Hard
A. A traditional RPA solution cannot interact with web-based ERP systems, whereas an IDP solution has native API connectors.
B. RPA bots are not capable of performing conditional logic (if/then statements), which is required to validate the extracted invoice data.
C. An IDP solution uses natural language understanding and computer vision to identify fields contextually, making it robust to variations in invoice templates without needing a new template for each vendor.
D. Standard OCR cannot read text from PDF documents, requiring an AI-based solution to digitize the text first.

50 In the context of a streaming data pipeline using a technology like Apache Kafka, what is the primary challenge associated with ensuring 'exactly-once' processing semantics?

Data pipelines and automation Hard
A. Ensuring that messages produced to Kafka are correctly serialized and deserialized by all consumers in the pipeline.
B. Encrypting the data in transit between Kafka brokers and consumers to meet security compliance requirements.
C. Coordinating distributed transactions between the streaming processor (e.g., Flink, Spark) and the output data sink (e.g., a database) to handle both processing failures and network failures without data duplication or loss.
D. Achieving high enough throughput to process messages as they arrive without creating a backlog in the Kafka topics.

51 A classification model exhibits high accuracy (98%) on a test set, but a confusion matrix reveals it performs very poorly on the minority class (e.g., high false negatives for a 'fraud' class). When plotted, the ROC curve shows an AUC of 0.95. Why is the high AUC score misleading in this scenario?

Error identification, Troubleshooting Hard
A. AUC is sensitive to the class imbalance; it's calculated by integrating over all possible classification thresholds, and with a large number of easy-to-classify true negatives, the curve can be pulled up and to the left, masking poor performance on the small positive class.
B. A high AUC score only indicates that the model's predictions are well-calibrated, not that it has good discriminative power between classes.
C. The test set was contaminated with data from the training set, artificially inflating all performance metrics including AUC.
D. The AUC calculation is mathematically incorrect when the number of positive samples is less than 10% of the total dataset.

52 You are creating a dashboard in Tableau to analyze customer churn. You want to display the churn rate for different customer segments, but also allow users to see how the churn rate would change if a hypothetical marketing intervention reduced churn by 15% for a user-selected segment. Which combination of Tableau features would be most effective for creating this interactive, what-if analysis?

Data analysis and visualization using AI tools (ChatGPT Advanced Data Analysis, Tableau) Hard
A. A Quick Filter for segment selection and an AI-powered Forecasting model to project the future churn rate.
B. A Level of Detail (LOD) expression to fix the churn rate at the segment level and a data blend from a separate spreadsheet containing the reduction percentages.
C. A Parameter for segment selection, a Parameter for the churn reduction percentage, and a Calculated Field that uses these parameters to compute the hypothetical churn rate.
D. A Story to create a sequence of dashboards, with one showing the actual churn and the next showing the manually calculated hypothetical churn.

53 A key challenge in federated learning, where models are trained on decentralized edge devices (e.g., mobile phones) without data leaving the device, is the 'non-IID' (non-independent and identically distributed) nature of the data. What is the most severe consequence of this non-IID data distribution?

AI Model Environments & Lifecycle Basics: Edge deployment Hard
A. The communication cost of sending model updates from the edge devices to the central server becomes prohibitively expensive.
B. The edge devices may not have enough computational power to train the local model effectively.
C. It becomes impossible to ensure data privacy as the model updates inherently leak information about the local data.
D. The global model, aggregated from the local models, can diverge or converge to a poor-performing minimum because the weight updates from different devices pull the model in conflicting directions.

54 In a CI/CD/CT (Continuous Training) pipeline for MLOps, what is the most appropriate trigger for automatically initiating a full model retraining job?

Introduction to MLOps and lifecycle management Hard
A. A software engineer commits a change to the model's inference API code in the Git repository.
B. A statistical monitoring tool detects significant 'concept drift' where the statistical properties of the live input data have diverged from the training data distribution.
C. The model's predictive accuracy on the live inference data drops by more than 20% from its initial baseline.
D. A fixed schedule, such as the first day of every month, to ensure the model is always up-to-date.

55 You need to process a dataset of one million 100-page PDF documents to extract specific clauses for a legal AI system. The goal is to perform this task efficiently. Which of the following approaches represents the most scalable and computationally efficient architecture?

Working with structured and unstructured data Hard
A. A serverless architecture where each PDF upload triggers a cloud function (e.g., AWS Lambda). The function performs OCR and clause extraction for that single document.
B. Manually loading the PDFs into a specialized document analysis desktop application and using its built-in tools to extract the clauses, saving the results to a CSV file.
C. A single, powerful server with multiple CPU cores that iterates through each PDF, using a multithreaded Python script to process documents in parallel.
D. A distributed processing pipeline using a framework like Apache Spark, where each worker node processes a subset of PDFs. Each worker uses an OCR library to extract text and a pre-trained transformer model (running on a GPU if available on the worker) for clause identification.

56 An organization is using Apache Airflow to orchestrate its daily ETL pipelines. A critical DAG (Directed Acyclic Graph) has a task that depends on a file arriving from an external partner in an S3 bucket. The file can arrive at any time between 2 AM and 5 AM. Which Airflow component is the most appropriate and efficient for handling this specific dependency?

Data pipelines and automation Hard
A. Write a Python function with a while True: loop and a time.sleep(60) call inside a PythonOperator to check for the file's existence.
B. Use a Sensor, specifically the S3KeySensor, which will periodically check for the existence of the file and only succeed when the file is found, allowing downstream tasks to run.
C. Use a TriggerDagRunOperator in a separate 'poller' DAG that runs every minute to check for the file and then trigger the main DAG.
D. Run the DAG on a fixed schedule at 5:05 AM and assume the file has arrived. If it hasn't, the task will fail and the on-call engineer will be paged to re-run it manually.

57 A financial services company is deploying a fraud detection model using a serverless inference endpoint on a major cloud provider (e.g., AWS SageMaker Serverless Inference, Google Vertex AI). What is the primary trade-off they must consider when choosing a serverless endpoint over a traditional, provisioned endpoint (a dedicated, always-on VM)?

AI Model Environments & Lifecycle Basics: Cloud services Hard
A. They trade higher security and network isolation for the convenience of a publicly accessible API endpoint.
B. They trade the ability to use custom Docker containers for a simplified, no-code deployment process.
C. They trade the support for deep learning models for exclusive support for traditional machine learning models like logistic regression.
D. They trade lower cost for infrequent traffic and automatic scaling for potentially higher 'cold start' latency on the first request after a period of inactivity.

58 A hospital wants to use AI to automate the preliminary reading of chest X-rays to flag urgent cases for radiologists. The AI model has a 95% accuracy in identifying a specific condition. However, the legal and ethical implications of a misdiagnosis are severe. Which AI process automation design pattern is most appropriate for this high-stakes scenario?

AI process automation Hard
A. An 'A/B testing' pattern, where 50% of X-rays are processed by the AI and 50% by radiologists to compare performance over time.
B. A 'Robotic Process Automation (RPA)' pattern, where a bot simply moves the X-ray files from one folder to another based on the AI model's output score.
C. A fully automated 'straight-through-processing' pattern, where the AI's positive predictions are immediately sent to the emergency department to save time.
D. A 'Human-in-the-loop' pattern, where the AI flags potential cases, but every single prediction (positive or negative) is reviewed and confirmed by a certified radiologist before any action is taken.

59 When deploying a new version of a customer-facing recommendation model, an MLOps team decides to use a 'Canary Release' strategy instead of a simple A/B test. What is the primary advantage of a Canary Release in this context?

Introduction to MLOps and lifecycle management Hard
A. It allows for a gradual rollout of the new model to a small subset of users (e.g., 1%), minimizing the potential negative impact (the 'blast radius') if the new model has unforeseen issues, while monitoring its performance closely before a full rollout.
B. It allows for a statistically rigorous comparison of the new model against the old model by randomly assigning users to two equally sized groups, ensuring the results are not biased.
C. It automatically rolls back the deployment if the new model's inference latency exceeds a predefined threshold, prioritizing system stability over model accuracy.
D. It ensures that the new model is only served to internal employees and beta testers before being released to the general public.

60 A neural network model for time-series forecasting is consistently underperforming, with predictions that seem to lag behind the actual data by one time step. The model architecture is a standard LSTM network. What is the most probable cause of this specific 'lagging' behavior?

Error identification, Troubleshooting Hard
A. The model is suffering from vanishing gradients due to the long sequences, preventing it from learning long-term dependencies.
B. The time-series data was not properly made stationary before being fed into the model, and the model is simply learning to predict the last observed value () as the forecast for the next step ().
C. The learning rate is too low, causing the model to converge very slowly and fail to capture the dynamic patterns in the data.
D. Data leakage, where the model was inadvertently trained to predict the next time step's value by using that same value as a feature (e.g., predicting using a feature set that includes ).