Bart Jansen

Distributing your Search Ingest pipeline using Dapr

Bart Jansen — Fri, 14 Jun 2024 10:25:28 GMT

In the ever-evolving landscape of cloud-native applications, the need for scalable and resilient architectures has never been more critical. Traditional monolithic systems often fall short when it comes to handling large volumes of data and complex workflows. Enter Dapr: Distributed Application Runtime, a powerful framework that helps you build highly scalable and resilient microservices architecture. In this blog post, we'll explore how you can leverage Dapr to distribute your search ingest pipeline, enhancing scalability, resiliency, and maintainability.

Why Dapr?

Traditional monolithic architectures often struggle with the demands of processing large volumes of documents for search indexing. Challenges include scalability, resiliency, and efficient orchestration of the various services involved in extracting, processing, and enriching document content. All these services have their own rate limits, which need to be managed carefully with proper back-off and retry strategies optimized for that specific service. Enter Dapr, with its microservices-oriented approach, offering a compelling solution to these challenges.

Comparison

Aspect	Monolithic Approach	Dapr-based Solution
Scalability	❌ Limited; scaling the entire application can be inefficient	✅ High; individual components can be scaled independently
Resiliency	❌ Retry policies and error handling can be complex	✅ Improved; easier to manage with Dapr's built-in mechanisms
Kubernetes Integration	❌ May require additional configuration	✅ Kubernetes-native; designed to work seamlessly with k8s
Monitoring	❌ Custom setup required for metrics and tracing	✅ Built-in support for Application Insights and other monitoring tools
Componentization	❌ All logic is within a single application	✅ Logic is distributed across multiple Dapr applications
Complexity	✅ Single application to manage	❌ Multiple applications increase management complexity
Asynchronous Processing	❌ Can be challenging to implement and track	✅ Native support for async operations, but tracking can be complex
Overhead	✅ Potentially lower as there's a single runtime	❌ Additional overhead due to Dapr sidecars and messaging/statestore components

Dapr shows significant improvements, adding just a bit of complexity and overhead

How does it work?

Dapr facilitates the development of distributed applications by providing building blocks that abstract away the complexities of direct communication between microservices, state management, and resource binding. By leveraging Dapr in a document processing pipeline, each stage of the process becomes a separate Dapr application, capable of independently scaling and recovering from failures.

Typical workflow

Consider a workflow designed to ingest, process, and index documents for search. The process involves several stages, from extracting text from documents to enriching the content with metadata, generating embeddings, and ultimately indexing the documents for search. With Dapr, each of these stages can be implemented as a separate microservice, communicating through Dapr's pub/sub messaging and utilizing shared state management and resource bindings.

The workflow typically includes the following stages:

Batcher: Initiates the process by listing documents from a storage service and triggering the document processing pipeline.
Process Document: Extracts text from documents using OCR or other text extraction tools and splits the content into manageable chunks.
Generate Embeddings: Converts text chunks into vector representations using machine learning models, facilitating efficient search and similarity comparisons.
Generate Keyphrases and Summaries: Enriches the content with key phrases and summaries, enhancing the searchability and relevance of the documents.
Indexing: Once all enrichments are complete, the processed documents are added to a search index, making them searchable.

Typical Dapr-approach for distributing search ingest

To summarize the flow:

Batcher is triggered by an HTTP request, retrieving every single document in the given blob path and adding these to a queue.
ProcessDocument is subscribed to this queue, pulling the raw document from blob and extracting its content using Form Recognizer/Document Intelligence, splitting it up into multiple manageable sections. Each section is added to 3 enrichment queues (GenerateEmbeddings, GenerateKeyphrases, GenerateSummary).
These 3 enrichment queues are processed in parallel, triggered with a reference to the section that's pulled from the statestore (e.g. Redis), enriched using an external service (e.g. OpenAI embedding, Azure Language API) with the enrichment stored back into the statestore and triggering an EnrichmentComplete .
Once all enrichments are captured for a single section, the section is stored into Blob for indexing and DocumentCompleted is triggered to notify the section is finished.
Similarly, once all sections for a single Document are processed, BatchCompleted is triggered to notify the Document is fully processed.
Once BatchCompleted has been triggered for all Documents that needed processing in the pipeline, Azure Search Indexer is started, pulling all sections from Blob to populate the search index.

This GitHub repository can serve as inspiration for implementing this flow, including scripts to deploy the infrastructure, local development and deployment scripts for deploying these services to Kubernetes.

Conclusion

Adopting Dapr for your search ingest pipeline can be a game-changer. It offers significant advantages in scalability, resiliency, and maintainability, making it a strategic investment for future-proofing your applications. While it introduces some complexity and overhead, the benefits of a microservices-oriented architecture, particularly in a Kubernetes environment, far outweigh these trade-offs.

Splitting each service into its own Dapr container provides several key advantages:

Granular Control: You can set specific rate limits for each external service, ensuring that no single service becomes a bottleneck.
Retry Mechanisms: By breaking down the ingestion flow into smaller, independent pieces, you can easily retry only the failed service without having to reprocess the entire workflow. This makes the system more efficient and resilient to errors.

For more details and to access the deployment scripts, visit the search-ingest-dapr GitHub repository.

From Zero to Blog Hero: Hosting your own blog for free with Ghost and GitHub Pages

Bart Jansen — Sun, 12 May 2024 11:49:53 GMT

I often get the question: how do you manage your blog? Do you pay anything for hosting your blog? Do you have some sort of markdown interpreter? How about Search Engine Optimizations, can your blog posts be found online using Google? The short respective answers: Completely managed through a self-hosted Ghost instance, completely free, with rich editor and out of the box search engine optimizations.

So, how does it work? Let's start with a list of requirements:

Running Ghost locally via Docker
Setting up ghost-static-page-generator for creating static page content
Set up Github Pages repository to host your static HTML
Optionally: Add CNAME record and update DNS records to host your blog on your custom domain

Running Ghost locally via Docker

After you've setup Docker, run Ghost locally with the following docker-compose.yml, this also sets up a local mysql container for storing your blog posts:

version: '3'
services:
  ghost:
    container_name: ghost
    image: ghost:latest
    restart: always
    ports:
      - 2368:2368
    volumes:
      - ./content:/var/lib/ghost/content
    environment:
      url: http://localhost:2368
      database__client: mysql
      database__connection__host: db
      database__connection__user: root
      database__connection__password: superS3cretp@ssw0rd
      database__connection__database: ghost

  db:
    container_name: ghost_mysql
    image: mysql:8.4
    command: mysqld --mysql-native-password=ON --authentication_policy=mysql_native_password
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: superS3cretp@ssw0rd
    volumes:
      - ./data:/var/lib/mysql

Note: remember to change your database password to a strong unique password

With this file in-place, run the container with:

docker-compose up -d

Once the initial run is complete, your Ghost instance will be available at http://localhost:2368.

Set up ghost-static-site-generator

With Ghost running locally, you can convert your blog to static html with gssg. This creates static html and resources for all the endpoints of your blog, which you can then host on a website. Static sites benefit from faster load times and enhanced security since they don't rely on server-side scripting, making your blog not only quick to access but also less vulnerable to web attacks.

To install gssg, with Node.js installed on your local machine, globally install gssg using:

npm install -g gssg

With gssg installed, you can generate the static site with:

gssg --domain http://localhost:2368 --url https://blog.example.com --dest ./static

With the following parameters

--domain this is the domain and port hosting your local Ghost instance
--url the external URL where your blog will be hosted (e.g., https://blog.example.com)
--dest the local path of your static website content

Push blog content to Github Pages

To make your blog accessible to the world, push your static HTML content to a GitHub repository configured with GitHub Pages. GitHub Pages offers a robust, free hosting solution that integrates seamlessly with your GitHub workflow. For most personal and project blogs, it provides a perfect balance between ease of use and functionality.

Follow the instructions below to setup your own GitHub repository with GitHub Pages:

GitHub Pages

Websites for you and your projects, hosted directly from your GitHub repository. Just edit, push, and your changes are live.

GitHub Pages

Create a repository named your-github-username.github.io. For example, mine is https://github.com/bart-jansen/bart-jansen.github.io
Push your static HTML content to this repository
Your blog is now available at https://your-github-username.github.io

Add CNAME record to host your blog on your own custom domain

To personalize your blog's URL, you can configure a custom domain by adding a CNAME record. A CNAME record (Canonical Name) redirects visitors from your default GitHub domain to a domain name of your choosing. For instance, instead of your-github-username.github.io, visitors could reach your blog via www.yourblogname.com.

Managing a custom domain for your GitHub Pages site - GitHub Docs

You can set up or update certain DNS records and your repository settings to point the default domain for your GitHub Pages site to a custom domain.

GitHub Docs

Follow instructions in the link above:

Navigate to Settings -> Pages and configure your domain e.g. bart.je which will push a CNAME file to root of your repository with bart.je
Add the appropriate DNS records to your domain provider
Optionally enforce HTTPS, using either your own SSL certificate, or by e.g. putting Cloudflare in between.

Optimizing Azure AI Search: Comparing Push and Pull approaches with performance recommendations

Bart Jansen — Mon, 06 May 2024 09:44:00 GMT

The Azure Search Index can be populated in two different ways. You can either directly push data into the index via the REST API/SDK (left image), or leverage the built-in Azure Search Indexer, which pulls data from a chosen DataSource and adds them to a specified index (right image)

Choosing between the pull and push models depends on the requirements you have. Do you care most about how fast your data gets indexed, or is security your top priority? Maybe you need the flexibility to work with many different data sources. Each model has its own set of pros and cons. To help you decide, we've put together a comparison table below that breaks down the differences between the pull and push models across several important areas, including their ability to enrich data with AI, security features, indexing performance, and more.

Aspect	Pull Model	Push Model	Notes
Enrichment	✅ Supports AI enrichment through built-in skillsets.	❌ Does not support AI enrichment capabilities via skillsets.	Manual enrichment before indexing also possible
Security	✅ Makes use of Azure AI Search's built-in security.	❌ All security measures must be implemented by the user.
Performance	❌ Inferior indexing performance compared to push model (currently ~3 times slower)*	✅ More performant indexing, allows async upload upto 1000 documents per batch.	Indexing performance is key for the overall ingestion duration.
Monitoring	✅ Azure AI Search provides built-in monitoring capabilities.	❌ User needs to monitor the index status manually.
Flexibility	❌ Limited to supported data sources; source needs to be compatible with Azure AI Search Indexer.	✅ More flexible; can push data from any source.	Azure Blob, Data Lake, SQL, Table Storage and CosmosDB are supported at time of writing.
Reindexing	✅ Easily able to reindex with a simple trigger, if data stays in place in DataSource.	❌ Need to cache documents and manually push each document again	Re-indexing a lot easier with pull model.
Tooling	✅ Indexer functionality is exposed in the Azure portal.	❌ No tool support for pushing data via the portal.

The comparison between the pull and push models shows that the pull model excels in areas like AI enrichment, built-in security, monitoring, and ease of reindexing, thanks to its integration with Azure AI Search's capabilities. However, it falls short in indexing performance and flexibility, being slower and limited to certain data sources.

Performance improvements

Feeding data into your search index, whether through push or pull, significantly impacts the overall time taken for ingestion. As seen from the comparison table above, using the push approach can make the process up to three times faster. These results come from the standard configuration, when using a single Azure Search Indexer and a single thread pushing batches of documents to the search index.

Using a dataset with 10,000 documents, we evaluated the performance of ingestions data with the pull and push models under various configurations:

Upgrade Azure AI Search service tier from S1 to S2
Add partitions, scaling document counts as well as faster data ingestion by spanning your index over multiple units
Parallel indexing, using multiple indexers and async pushing documents

We'll have a look at the impact of each individual configuration first, and then combine configurations for further optimizations. To ensure consistency in testing conditions and to mitigate the effects of throughput and latency, all data ingestions were conducted within the same Azure Region, originating from an Azure-hosted machine. The results presented are the average of five test runs.

	Indexer/Pull	Push
S1 1 partition	3:10	1:05
S2 1 partition	2:51	0:57
S1 4 partitions	3:22	1:10

As shown in the results above, the push method is almost three times faster than using an indexer.Interestingly, the introduction of additional partitions and upgrading to an S2 search instance had a negligible effect on performance in these tests. These results suggest that both the addition of more partitions and upgrading to an S2 search instance primarily enhance capacity and query-performance, rather than directly improving the rate of data ingestion.

To speed things up, we also investigated parallel indexing. For the pull model we can use multiple indexers that write to the same index, and with the push model we can asynchronously push multiple batches simultaneously. Here, we also play with the partition size, to see how that effects the results.

	Indexer/Pull	Push
S1 1 parallel	3:10	1:05
S1 10 parallel	0:20	0:59
S1 20 parallel	0:11	0:49
S1 40 parallel	0:06	0:34

The performance of the indexer scaled almost linearly with parallelization, showing significant improvements. Specifically, using 10 indexers was approximately 9.5 times faster, 20 indexers around 17.3 times faster, and 40 indexers about 31.2 times faster compared to a single indexer. Although asynchronously pushing batches also enhanced performance, the improvement was roughly 2 times better when using 40 parallel threads for pushing. Increasing parallel threads further had a negative impact and decreased performance for both push and pull.

Considerations

Choosing between the push and pull models for populating the Azure Search Index should be based on project-specific requirements, including indexing speed, security, source flexibility, AI enrichment potential and the need for re-indexing over time. While the pull model integrates closely with Azure AI Search's advanced features, offering built-in enrichment and easy re-indexing, it lags behind the push model in terms of speed and flexibility.

Using standard configurations, the push model outperforms pull and is three times faster. However, you can setup multiple indexers to linearly improve ingestion speed for each indexer added, minding the maximum limit of indexers. This does require a bit of orchestration, where you need to split your indexes over multiple data sources (or folders) and track the progress of multiple indexers.

Transforming PDF Chaos into clarity: Add PDF Insights with Azure AI Search & RAG

Bart Jansen — Mon, 13 Nov 2023 13:52:00 GMT

A great way to add insights to your unstructured PDFs is by ingesting them into an Azure AI Search Index and adding an LLM on top, also known as the RAG pattern as described in: Enhancing Your Data with Azure: Unlocking Insights through Retrieval-Augmented Generation.

This blog posts shows various code snippets, on how to achieve these different steps using Python. The first bit is to setup and populate the Search Index, and the last sections show how to query the search index and enrich the data using GPT.

Please keep in mind that these snippets are a great starting point, but have been kept as small as possible to fit the format of this blog. Comprehensive processing such as chunking, text splitting, enriching, embedding and semantic configuration, semantic reranking have consciously been kept out of the scope of this blog post, but are required for creating an effective search index.

Create Azure AI Search Index

First step is to create an Azure AI Search Index, this can be done through the Azure Portal, the REST API or with the Python SDK:

SEARCH_SERVICE = "your-azure-search-resource"
SEARCH_INDEX = "your-search-index"
SEARCH_KEY = "your-secret-azure-search-key"
SEARCH_CREDS = AzureKeyCredential(SEARCH_KEY)
SEARCH_CLIENT = SearchIndexerClient(endpoint=f"https://{SEARCH_SERVICE}.search.windows.net/", credential=SEARCH_CREDS)

def create_index():
    client = SearchIndexClient(endpoint=f"https://{SEARCH_SERVICE}.search.windows.net/", index=SEARCH_INDEX, credential=SEARCH_CREDS)
    # Define the index
    index_definition = SearchIndex(
        name=SEARCH_INDEX,
        fields=[
            SearchField(name="id", type=SearchFieldDataType.String, key=True),
            SearchField(name="content", type=SearchFieldDataType.String, filterable=True, sortable=True),
            SearchField(name="sourcefile", type=SearchFieldDataType.String, filterable=True, facetable=True),
            SearchField(
                name="embedding",
                type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                hidden=False,
                searchable=True,
                filterable=False,
                sortable=False,
                facetable=False,
                vector_search_dimensions=1536,
                vector_search_configuration="default",
            )
        ],
        semantic_settings=SemanticSettings(
            configurations=[
                SemanticConfiguration(
                    name='default',
                    prioritized_fields=PrioritizedFields(
                        title_field=None, prioritized_content_fields=[SemanticField(field_name='content')]
                    )
                )
            ]
        ),
        vector_search=VectorSearch(
            algorithm_configurations=[
                VectorSearchAlgorithmConfiguration(
                    name="default",
                    kind="hnsw",
                    hnsw_parameters=HnswParameters(metric="cosine")
                )
            ]
        )
    )
    
    # Create the index
    client.create_index(index=index_definition)

This sets up an Azure AI Search Index with these fields:

id : ID of the document
content : plain text content of your document
sourcefile : PDF file used, including page number of this document
embedding : vectorized embedding of your plain text content

Since we're using a vector embedding field we configure vector_search, we also setup a default semantic_configuration and define which fields to use for our (non-vector) content (content in our case).

Process PDFs

With the index in place, we need to split PDFs to process, chunk size & overlap size, form recognizer for OCR/Tables, enrichment.

First, we split the PDFs into single page documents:

import io
from PyPDF2 import PdfFileReader, PdfFileWriter

def split_pdf_to_pages(pdf_path):
    """
    Splits a PDF file into individual pages and returns a list of byte streams, 
    each representing a single page.
    """
    pages = []
    with open(pdf_path, 'rb') as file:
        reader = PdfFileReader(file)

        for i in range(reader.getNumPages()):
            writer = PdfFileWriter()
            writer.addPage(reader.getPage(i))

            page_stream = io.BytesIO()
            writer.write(page_stream)
            page_stream.seek(0)
            pages.append(page_stream)

    return pages

# Example usage
pdf_path = 'path/to/your/pdf/file.pdf'
pages = split_pdf_to_pages(pdf_path)

We then use the contents of these single-page documents with Azure Form Recognizer (also known as Azure Document Intelligence), to extract the text from the document:

FORM_RECOGNIZER_SERVICE = "your-fr-resource"
FORM_RECOGNIZER_KEY = "SECRET_FR_KEY"
FORM_RECOGNIZER_CREDS = AzureKeyCredential(FORM_RECOGNIZER_KEY)

def get_document_text_from_content(blob_content):
    offset = 0
    page_map = []

    form_recognizer_client = DocumentAnalysisClient(
        endpoint=f"https://{FORM_RECOGNIZER_SERVICE}.cognitiveservices.azure.com/", 
        credential=FORM_RECOGNIZER_CREDS, 
        headers={"x-ms-useragent": "azure-search-sample/1.0.0"}
    )

    poller = form_recognizer_client.begin_analyze_document("prebuilt-layout", document=blob_content)
    form_recognizer_results = poller.result()

    for page_num, page in enumerate(form_recognizer_results.pages):
        # Extract text for each page
        page_text = page.content
        page_map.append((page_num, offset, page_text))
        offset += len(page_text)

    return page_map

Split text into indexable sized chunks

With the page contents extracted from the PDF, per page, we can split the text to chunks, an easy way to do this is:

def split_text(page_map, max_section_length):
    """
    Splits the text from page_map into sections of a specified maximum length.

    :param page_map: List of tuples containing page text.
    :param max_section_length: Maximum length of each text section.
    :return: Generator yielding text sections.
    """
    all_text = "".join(p[2] for p in page_map)  # Concatenate all text
    start = 0
    length = len(all_text)

    while start < length:
        end = min(start + max_section_length, length)
        section_text = all_text[start:end]
        yield section_text
        start = end

# Example usage
max_section_length = 1000  # For example, 1000 characters per section
sections = split_text(page_map, max_section_length)

for section in sections:
    print(section)  # Process each section as needed

Note: the snippet above is a very simplistic way of splitting text. In production you'd want to take into account sentence_endings, overlap, word_breaks, tables, cross-page sections, etc.

Create Search Index sections

We can use the page_map from get_document_from_text to call thesplit_text function and setup the sections for our index:

def create_sections(filename, page_map):
    for i, (content, pagenum) in enumerate(split_text(page_map, filename)):
        section = {
            "id": f"{filename}-page-{i}",
            "content": content,
            "sourcefile": filename
        }
        section["embedding"] = compute_embedding(content)
        yield section

Generate embeddings

We can generate embeddings using Azure OpenAI's text-embedding-ada-002 model:

# Configurations
OPENAI_SERVICE = "your-azure-openai-resource"
OPENAI_DEPLOYMENT = "embedding"
OPENAI_KEY = "your-secret-openai-key"

# OpenAI setup
openai.api_type = "azure"
openai.api_key = OPENAI_KEY
openai.api_base = f"https://{OPENAI_SERVICE}.openai.azure.com"
openai.api_version = "2022-12-01"

def compute_embedding(text):
    return openai.Embedding.create(engine=OPENAI_DEPLOYMENT, input=text)["data"][0]["embedding"]

Ingest Data into Search Index

With the computed sections from create_sections we can batch-upload them (in pairs of 1000 documents) into our Search Index:

def index_sections(filename, sections):
    """
    Indexes sections from a file into a search index.

    :param filename: The name of the file being indexed.
    :param sections: The sections of text to index.
    """
    search_client = SearchClient(endpoint=f"https://{SEARCH_SERVICE}.search.windows.net/",
                                 index_name=SEARCH_INDEX,
                                 credential=SEARCH_CREDS)

    batch = []
    for i, section in enumerate(sections, 1):
        batch.append(section)
        if i % 1000 == 0:
            search_client.upload_documents(documents=batch)
            batch = []

    if batch:
        search_client.upload_documents(documents=batch)

# filename and sections from previous steps
index_sections(filename, sections)

The result of this last ingestion step is a fully populated Search Index, ready to be consumed.

Retrieval Augmented Generation

Now we can query our Search Index endpoint, using:

from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

def search_index(query, endpoint, index_name, api_key):
    """
    Searches the indexed data in Azure Search.

    :param query: The search query string.
    :param endpoint: Azure Search service endpoint URL.
    :param index_name: Name of the Azure Search index.
    :param api_key: Azure Search API key.
    :return: Search results.
    """
    credential = AzureKeyCredential(api_key)
    search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

    results = search_client.search(query)
    return [result for result in results]

# Example usage
endpoint = 'https://[service-name].search.windows.net'  # Replace with your service endpoint
index_name = 'your-index-name'  # Replace with your index name
api_key = 'your-api-key'  # Replace with your API key
search_query = 'example search text'

search_results = search_index(search_query, endpoint, index_name, api_key)

for result in search_results:
    print(result)  # Process each search result as needed

Enrich results with GPT-4

After retrieval, we can enrich the search results with GPT-4:

import openai

def enrich_with_gpt(result, openai_api_key):
    """
    Enriches the search result with additional information generated by OpenAI GPT-4.

    :param result: The search result item to enrich.
    :param openai_api_key: OpenAI API key.
    :return: Enriched information.
    """
    openai.api_key = openai_api_key

    # Construct a prompt based on the result for GPT-4
    prompt = f"Based on the following search result: {result}, generate additional insights."

    # Call OpenAI GPT-4 to generate additional information
    response = openai.Completion.create(engine="gpt4-32k", prompt=prompt, max_tokens=150)
    return response.choices[0].text.strip()

# Example usage
openai_api_key = 'your-openai-api-key'  # Replace with your OpenAI API key

enriched_results = []
for result in search_results:
    enriched_info = enrich_with_gpt(result, openai_api_key)
    enriched_results.append((result, enriched_info))

for result, enriched_info in enriched_results:
    print("Original Result:", result)
    print("Enriched Information:", enriched_info)
    print("-----")

Next Steps

As mentioned in the beginning of this blog post, this code snippets can serve as the basis of your RAG ingestion & consumption pipeline. But to come increase the effectiveness of your RAG implementation, there's a lot of improvements that can be made.

Enhancing Your Data with Azure: Unlocking Insights through Retrieval-Augmented Generation

Bart Jansen — Wed, 08 Nov 2023 19:49:00 GMT

In the rapidly evolving world of data analytics and AI, new frameworks are popping up every day to enhance the way we interact with and understand our data. One of the most exciting developments is the integration of tailor-made Large Language Models (LLM) into business processes. These models, like Azure OpenAI's GPT-4, are transforming how companies get insights from their data. In this post, we'll dive into how you can leverage Retrieval-Augmented Generation (RAG) using Azure OpenAI and Azure Cognitive Search to create a CoPilot experience for your data.

What Is RAG and Why Should You Care?

Retrieval-Augmented Generation is an architecture that combines the best of two worlds: the knowledge and natural language understanding of LLMs and the precision of an effective search index. While there are other alternatives available, RAG stands out for its ease of use and effectiveness.

As shown in the diagram above, RAG works by first retrieving relevant information from, e.g. a search index, and then using that specific knowledge index from the data sources to generate more informed and accurate responses using GPT. This makes it particularly valuable for businesses looking to extract insights from their existing data, while having the ability to form a proper natural language based response.

How to Implement RAG in Your Data Pipeline

Implementing RAG involves two main components: setting up an ingestion pipeline that indexes all your business-specific data and creating a RAG consumption pipeline. Here's how you can do it:

Setting Up the Ingestion Pipeline:

Create an Azure Cognitive Search Index with Vector Embeddings: Start by building an index that can handle vector embeddings. This will lay the foundation for powerful search capabilities.
Ingest Your Data and Populate Your Index: Once your index is ready, ingest your existing data. This step is crucial as it populates your index with the data that the RAG model will use.

Setting Up the RAG Consumption Pipeline:

Generate Embeddings for Prompts: Use Azure OpenAI to generate embeddings for your prompts. These embeddings represent the semantic content of your queries.
Query Azure Cognitive Search Endpoint: Hit the Azure Cognitive Search endpoint to retrieve the top results based on your embeddings.
Combine Results with a Customized System Prompt: Take the results from Azure Cognitive Search and combine them with a customized prompt. Then, use this combined input to query the GPT-4 model to provide an insightful answer.

Testing and Optimizing Performance

An essential part of implementing RAG is testing and optimizing the performance of your Azure Cognitive Search Index. You can use an LLM like GPT-4 to generate question-answer pairs relevant to your data. This approach allows you to test how well your index performs, giving you insights into the effectiveness of your data indexing strategies, chunking, overlap size, and data enrichment processes. A simple approach using synthetic QA generation is described here.

Conclusion

Adding RAG to your data pipeline using Azure OpenAI and Azure Cognitive Search is not just about keeping up with the latest tech trends. It's about unlocking new levels of understanding and insights from your data, tailored specifically to your company's data. With the steps outlined above, you're well on your way to implementing a powerful CoPilot for your data.

Host your own ChatGPT instance using Azure OpenAI

Bart Jansen — Tue, 31 Oct 2023 14:30:19 GMT

It’s 2023, everyone is playing around with ChatGPT on OpenAI and a new and improved version of GPT is being released every couple of months with lots of new improvements which make you even more effective.

The problem with these new models is that they’re only available for customers with OpenAI plus subscriptions, setting you back $20 each month (excluding tax). Instead of using OpenAI’s version, you can also setup your own gpt-4 instance on Azure OpenAI .

Great! But what’s in it for me?

You pay as you go; instead of a flat fee of $20 per month, you pay a small amount per query
Your data isn’t used for training the models, whereas the OpenAI ChatGPT implementation is continuously used to improve the model.
More flexibility: you can customize the temperature of the GPT model and customize the system prompt message
It allows you to have way larger context windows for your queries, at the time of writing Azure OpenAI supports 32k token limit versus 8k token limit of ChatGPT's Plus subscription

But where’s my nice ChatGPT-like UI?

Luckily there are lots of open source solutions that have you covered. All very much inspired by ChatGPT’s interface, showing a prompt history, word per word output and nice formatting.

A personal favorite of mine is chatbot-ui. I run this as a docker container on a raspberry pi in my local network, accessible only within my local network or with a proper VPN connection:

docker run \
  -e OPENAI_API_KEY=YOURKEY \
  -e AZURE_DEPLOYMENT_ID=YOURDEPLOYMENTNAME \
  -e OPENAI_API_HOST=https://YOURENDPOINT.openai.azure.com \
  -e OPENAI_API_TYPE=azure \
  -e DEFAULT_MODEL=gpt-4-32k \
  -p 3000:3000 \
  bartmsft/chatbot-ui:main

*don't ever run UIs like this, with your OpenAI GPT keys, on the public internet

Simply replace your OPENAI_API_KEY, AZURE_DEPLOYMENT_ID and OPENAI_API_HOST with respectively your primary key, deployment-name and Azure OpenAI endpoint. Your UI will be available on http://localhost:3000.

Update May 2024: As of earlier this year, the original maintainer of chatbot-ui moved to a new interface, adding more features but also adding lots of extra dependencies. Therefore I'm using a personal fork on Docker Hub - bartmsft of the old instance.

Using a Managed Identity with Databricks to Run Notebooks Through a Web App

Bart Jansen — Wed, 18 Oct 2023 08:08:00 GMT

Databricks is great for processing data, and out of the box comes with 'Jobs' to automate and schedule notebooks. If you want more control over these triggers, customize the notebook parameters or have your own orchestrator, you could also trigger these notebook runs via another resource, such as a Web App. We can use the Web App's managed identity to run Databricks notebooks. This is a powerful way to automate and secure your data processing workflows, and it's surprisingly easy to set up.

What is a Managed Identity?

Before we dive in, let's quickly define what a managed identity is. In Azure, a managed identity is a service principal that's automatically managed by Azure. It provides an identity for applications to use when connecting to resources that support Azure Active Directory (Azure AD) authentication.

Why not use Role-Based Access Control (RBAC)?

While Role-Based Access Control (RBAC) is a common practice in many Azure services to manage access, when it comes to Databricks, there are specific reasons to opt for Databricks' own access control management:

Customized Access Control: Databricks provides its own specialized access system, allowing for direct integration with Azure resources via Service Principals. This approach is tailored to Databricks' unique workflows and ecosystem.
Granularity of Permissions: Databricks allows specific entitlements, such as workspace access or cluster creation. This refined control surpasses the often generalized permissions of traditional RBAC, granting precise access based on exact needs.
Group-Based Assignments: Beyond just entitlements, Databricks facilitates the assignment of Service Principals to specific groups. This caters to varied levels of access needed for different jobs or notebooks, ensuring both security and flexibility.

In summary, while RBAC is a valuable tool for many Azure services, when working within Databricks, leveraging its custom access control mechanisms can offer more precise, flexible, and streamlined management of permissions and access.

In order to programatically setup these permissions we can leverage the SCIM API to assign a managed identity and assign it to a group. These groups can have specific entitlements for specific workspaces. One way to do this is shown in the diagram below:

Setting Up a Managed Identity

To set up a managed identity for your existing Web App, you'll need to follow these steps:

In the Azure Portal, open the Web App
Click on "Identity" and enable "System assigned managed identitiy".
Copy the generated "Object (principal) ID" and search for the associated Enterprise Application in Azure Active Directory.
Copy the associated Application ID, as we will need this to configure Databricks.

Assigning the Managed Identity to Databricks

After creating the managed identity, you'll need to assign it to your Databricks workspace:
1. In the Azure portal, go to your Databricks workspace.
2. In the Admin settings (right top), go to the "Service principals" section.
3. Click on Add Service Principal, add the Application ID you copied earlier and tick "Allow workspace access" to ensure the Service Principal has sufficient privileges to run notebooks.

Running Databricks Notebooks from a Web App

Now that we have a managed identity set up and assigned to Databricks, we can use it to run Databricks notebooks from a web application. Here's how:

In your web application, use the Azure SDK to get a token for the managed identity. This token will be used to authenticate to Databricks.
Use the Databricks REST API to create a new job. The job should be configured to run the notebook you want.
Start the job using the Databricks REST API. You'll need to include the token you got in step 1 in the Authorization header.

Here's a sample code snippet in Python that shows how to do this:

from azure.identity import DefaultAzureCredential
from azure.databricks import DatabricksClient

# Get a token for the managed identity
credential = DefaultAzureCredential()
token = credential.get_token('2ff814a6-3304-4ab8-85cb-cd0e6f879c1d')

# Create a Databricks client
client = DatabricksClient(base_url='https://', token=token.token)

# Create a new job
job_id = client.jobs.create({'notebook_task': {'notebook_path': '/path/to/notebook'}})

# Start the job
client.jobs.run_now(job_id)

With this setup, you can securely run Databricks notebooks from your web application using a managed identity. The same applies for other services that support managed identity (e.g. Azure Functions, Azure Container Instances, AKS, etc) and the same also applies for User Assigned Managed Identities instead of System Assigned Managed Identities.

Automate deployment using Terraform

Instead of configuring the managed identity manually, we can leverage the azure and databricks terraform providers to automate the creation of the managed identity and attaching it to a databricks cluster. The databricks providers uses the SCIM API under the hood, and here's an example how to tie everything together in terraform:

provider "azurerm" {
  features {}
}

data "azurerm_databricks_workspace" "workspace" {
  name                = "your-databricks-workspace-name"
  resource_group_name = "your-databricks-rg-name"
}

provider "databricks" {
  host                        = data.azurerm_databricks_workspace.workspace.workspace_url
  azure_workspace_resource_id = data.azurerm_databricks_workspace.workspace.id
}

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_service_plan" "example" {
  name                = "example"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  os_type             = "Linux"
  sku_name            = "P1v2"
}

resource "azurerm_linux_web_app" "example" {
  name                = "example"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_service_plan.example.location
  service_plan_id     = azurerm_service_plan.example.id

  site_config {}

  identity {
    type = "SystemAssigned"
  }
}

/* 
The code below adds the Managed Identity service principal to the databricks 
admin group using the Databricks Terraform Provider. 

This is required to allow the WebApp to access the Databricks REST API.
*/
data "databricks_group" "admin" {
  display_name = "admin" // existing admin group in databricks workspace
}

resource "databricks_service_principal" "sp" {
  application_id = azurerm_linux_web_app.example.identity[0].client_id
  display_name   = "appsvc-id"
}

resource "databricks_group_member" "admin_group" {
  group_id  = data.databricks_group.admin.id
  member_id = databricks_service_principal.sp.id
}

Wrapping up

Managed identities offer a secure and simple way to authenticate to services like Databricks. By using a managed identity, you can eliminate the need to store sensitive credentials in your code and simplify your authentication process. Assigning the managed identity to Databricks, allows us to have granular control on what the identity (i.e. Web App) can access and this will allow you to e.g. trigger Databricks workloads from other resources in Azure.

Adding Event-driven Autoscaling to your Kubernetes Cluster

Bart Jansen — Tue, 13 Jul 2021 12:57:18 GMT

Azure Kubernetes Service (AKS) comes with a Cluster Autoscaler (CA) that can automatically add nodes to the node pool based on the load of the cluster (based on CPU/memory usage). KEDA is active on the pod-level and uses Horizontal Pod Autoscaling (HPA) to dynamically add additional pods based on the configured scaler. CA and KEDA therefore go hand-in-hand when managing dynamic workloads on an AKS cluster since they scale on different dimensions, based on different rules, as shown below:

Vertical and Horizontal scaling in Kubernetes

Overview

An overview of KEDA that scales an App based on the Topic Queue size of Azure Service Bus is shown in this diagram:

Communication flow for KEDA scaling an app in AKS

The app is deployed together with a KEDA-backed ScaledObject. This ScaledObject supports minReplicaCount and maxReplicaCount that defines the range of concurrent replicas for the pods that can exist for the app. Furthermore, a scale trigger object is defined inside the ScaledObject that defines the scaling criteria and conditions for scaling up and down.

Although optional, the diagram above also uses Pod Identity. Similar to how secrets are fetched from Azure Key Vault inside containers, Pod Identity is used with KEDA to directly subscribe to an e.g. Azure Service Bus Topic to scale the pods without passing any connecting strings by specifying its AzureIdentityBinding.

Usage

The KEDA Helm Chart needs to be installed on the AKS cluster and configured to use AzureIdentityBinding to access resources in Azure. This Azure AD Identity needs to have sufficient RBAC permissions to directly access the required resources in Azure.

The ScaledObject is defined as follows, which is deployed along with the application deployment specified under scaleTargetRef. This needs to match the deployment name which needs to be deployed in the same Kubernetes namespace.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: msg-processor-scaler
spec:
  scaleTargetRef:
    name: msg-processor # must be in the same namespace as the ScaledObject
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: azure-servicebus
    metadata:
      namespace: SERVICE_BUS_NAMESPACE 
      topicName: SERVICE_BUS_TOPIC
      subscriptionName: SERVICE_BUS_TOPIC_SUBSCRIPTION
      messageCount: "5"
    authenticationRef:
      name: trigger-auth-service-bus-msgs

It defines the type of scale trigger we would like to use and the scaling criteria specified under the metadata object. Lots of different KEDA scalars are available out of the box and details can be found by going through the KEDA documentation.

In the example above, we use an azure-servicebus scalar and would like to scale out if there are more than 5 unprocessed messages on the topic subscription SERVICE_BUS_TOPIC_SUBSCRIPTION on the SERVICE_BUS_TOPIC topic in the SERVICE_BUS_NAMESPACE kubernetes namespace. Scaling will go up to a maximum of 10 concurrent replicas which is defined via maxReplicaCount and there will always be a 1 pod minimum as defined by minReplicaCount.

Since we are using Pod Identity, we also specify the authenticationRef for the ScaledObject totrigger-auth-service-bus-msgs. This is a TriggerAuthentication resource that defines how KEDA should authenticate to get the metrics.

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-service-bus-msgs
spec:
  podIdentity:
    provider: azure

In this case, we are telling KEDA to use Azure as a Pod Identity provider which uses Azure AD Pod Identity. Alternatively, a full connection string can also be used without specifying a TriggerAuthentication resource.

By using a TriggerAuthentication you can easily re-use this authentication resource, but it also allows you to separate the permissions for KEDA and other resources inside your kubernetes cluster by binding them to different Azure AD Identities with different RBAC permissions.

Using alternative KEDA scalars

The example above shows how to configure KEDA for autoscaling using Azure Service Bus Topics, but lots of other scalars are supported out of the box and more information can be found on KEDA Documentation - Scalars .

Note: when adding additional triggers, also ensure that Pod Identity can read from these resources by adding the corresponding RBAC permissions.

Using the prebuilt Helm chart with KEDA support

If you have deployed a fully configured AKS cluster to Azure and you are also using the accompanying Umbrella Helm chart for easy deployment of your apps then you're in luck, because it also allows you to easily add KEDA support for the applications you deploy to the AKS cluster.

Most of the documentation to get started is available on GitHub, but here's a sample on how you would deploy the application described above using this Helm Chart.

helm-app:
  app:
    name: app-service-name
    container:
      image: hello-world:latest
      port: 80

  keda:
    enabled: true
    name: app-service-name-keda
    authRefName: auth-trigger-app-service-name
    scaleTargetRef: app-service-name 
    minReplicaCount: 1
    maxReplicaCount: 10
    triggers:
    - type: azure-servicebus
      metadata:
        topicName: sbtopic-app-example-service
        subscriptionName: sbsub-app-example-service
        namespace: servicebus-app-example-ns
        messageCount: 5

Using this Helm Chart, you can easily deploy your Service, Deployment, Scaled Object, AuthenticationTriggers and optionally all other resources (e.g. ingress, secretstore) your deployment requires.

Deploying a fully configured AKS cluster in Azure using Terraform

Bart Jansen — Tue, 23 Mar 2021 15:10:00 GMT

Setting up an Azure Kubernetes Service (AKS) using Terraform, is fairly easy. Setting up a full-fledged AKS cluster that can read images from Azure Container Registry (ACR), fetch secrets from Azure Key Vault using Pod Identity while all traffic is routed via an AKS managed Application Gateway is much harder.

To save others from all the trouble I encountered while creating this one-click-deployment, I've published a GitHub repository that serves as a boilerplate for the scenario described above, and fully deploys and configures your Azure Kubernetes Service in the cloud using a single terraform deployment.

This blog post goes into the details of the different resources that are deployed, why these specific resources are chosen and how they tie into each other. A future blog post will build upon this and explain how Helm can be used to automate your container deployments to the AKS cluster.

Architecture Overview

Azure Kubernetes Service

Azure Kubernetes Service (AKS) makes it simple to deploy a managed Kubernetes cluster in Azure. Kubernetes is an open-source container orchestration platform that automates many of the manual processes involved in deploying, managing, and scaling containerized applications. Having this cluster is ideal when you want to run multiple containerized services and don't want to worry about managing and scaling them.

Azure Container Registry

Azure Container Registry (ACR) is a managed Docker registry service, and it allows you to store and manage images for all types of container deployments. Every service can be pushed to its own repository in Azure Container Registry and every codebase change in a specific service can trigger a pipeline that pushes a new version for that container to ACR with a unique tag.

AKS and ACR integration is setup during the deployment of the AKS cluster with Terraform. This allows the AKS cluster to interact with ACR, using an Azure Active Directory service principal. The Terraform deployment automatically configures RBAC permissions for the ACR resources with an appropriate ACRPull role for the service principal.

With this integration in place, AKS pods can fetch any of the Docker images that are pushed to ACR, even though ACR is setup as a private docker registry. Don't forget to add the azurecr.io prefix for the container and specify a tag. It is best practice to not use that :latest tag since this image always points to the latest image pushed to your repository and might introduce unwanted changes. Always pinpoint the container to a specific version and update that version in your yaml file when you want to upgrade.

A simple example for a pod that's running a container from the youracrname.azurecr.io container registry, the test-container repository with tag 20210301, is shown below:

---
apiVersion: v1  
kind: Pod  
metadata:  
  name: test-container
spec:  
  containers:
    - name: test-container
      image: youracrname.azurecr.io/test-container:20210301

Pod Identity

AAD Pod Identity enables Kubernetes applications to access cloud resources securely with Azure Active Directory. It's best practice to not use fixed credentials within pods or container images, as they are at risk of exposure or abuse. Instead, we're using pod identities to request access using Azure AD.

When a pod needs access to other Azure services, such as Cosmos DB, Key Vault, or Blob Storage, the pod needs access credentials. You don't manually define credentials for pods, instead they request an access token in real time, and can use it to only access their assigned services that are defined for that identity.

Pod Identity is fully configured on the AKS cluster when the Terraform script is deployed, and pods inside the AKS cluster can use the preconfigured pod identity by specifying the corresponding aadpodidbinding pod label.

Once the identity binding is deployed, any pod in the AKS cluster can use it by matching the pod label as follows:

apiVersion: v1  
kind: Pod  
metadata:  
  name: demo
  labels:
    aadpodidbinding: $IDENTITY_NAME
spec:  
  containers:
  - name: demo
    image: mcr.microsoft.com/oss/azure/aad-pod-identity/demo:latest

Azure Key Vault Secrets Store CSI Driver

Some of the services in the AKS cluster connect to external services. The connection strings and other secret values that are needed by the pods are stored in Azure Key Vault. By storing these variables in Key Vault, we ensure that these secrets are not versioned in the git repository as code, and not accessible to anyone that has access to the AKS cluster.

To securely mount these connection strings, pod identity is used to mount these secrets in the pods and make them available to the container as environment variables. The flow for a pod fetching these variables is shown in the diagram below:

Flow for fetching Azure Key Vault secrets using Pod Identity

Luckily, there's no need to to worry about all these different data flows, and we can just use deploy the provided Azure Key Vault Provider for Secrets Store CSI Driver on the AKS cluster. This provider leverages pod identity on the cluster and provides a SecretProviderClass with all the secrets that you want to fetch from Azure Key Vault.

A basic example for setting up a SecretStore from KeyVault kvname, getting secret secret1 and exposing these as a SecretStore named kv-secrets:

apiVersion: secrets-store.csi.x-k8s.io/v1alpha1  
kind: SecretProviderClass  
metadata:  
  name: kv-secrets
spec:  
  provider: azure
  parameters:
    usePodIdentity: "true"        # set to "true" to enable Pod Identitiy
    keyvaultName: "kvname"        # the name of the KeyVault
    objects:  |
      array:
        - |
          objectName: secret1     # name of the secret in KeyVault
          objectType: secret
    tenantId: ""                  # tenant ID of the KeyVault

To use these secret secret1 from kv-secrets in a pod and making it available in the nginx container with the environment variable SECRET_ENV:

spec:  
  containers:
  - image: nginx
    name: nginx
    env:
    - name: SECRET_ENV
      valueFrom:
        secretKeyRef:
          name: kv-secrets
          key: secret1

Application Gateway

All traffic that accesses the AKS cluster is routed via an Azure Application Gateway. The Application Gateway acts as a Load Balancer and routes the incoming traffic to the corresponding services in AKS.

Specifically, Application Gateway Ingress Controller (AGIC) is used. This Ingress Controller is deployed on the AKS Cluster on its own pod. AGIC monitors the Kubernetes cluster it is hosted on and continuously updates an Application Gateway, so that selected services are exposed to the Internet on the specified URL paths & ports straight from the Ingress rules defined in AKS. This flow is shown in the diagram below:

As shown in the overview diagram, the client reaches a Public IP endpoint before the request is forwarded to the Application Gateway. This Public IP is deployed as part of the Terraform deployment on an Azure-based FQDN (e.g. your-app.westeurope.cloudapp.azure.com).

Closing remarks

Lots of details on the inner workings for some of these resources. Fortunately, all of these configurations happen for you and all you need to do is setup the tfvars variables for your environment. Keep an eye out for the upcoming blog post on setting up Helm to automate your container deployments.

What are you waiting for? Clone the repo and start deploying your fully configured AKS cluster.

How do you pimp your festival? With a chat bot!

Bart Jansen — Thu, 26 Jan 2017 16:34:00 GMT

Last week, we teamed up with 20 different developers for a hackathon to create a chat bot for the Eurosonic Noorderslag festival. Eurosonic Noorderslag is an annual festival for new and upcoming artists with over 40,000 attendees. The purpose of our bot is to provide the attendees with near real-time answers to everything related to the festival. Together with these developers we built a fully functioning bot using the Azure Bot Framework.

In this blog I'll briefly introduce the technology we have used for the bot, and I'll go over our approach for getting the chat bot ready in two days without any prior experience with creating bots.

Bot Framework

The fundament of our bot is based on the Azure Bot Framework. This framework comes with several advantages:

Easy integrations with all popular chat platforms (i.e. Skype, Messenger, Slack, etc.)
Serverless bot environment where you can easily scale and pay as you go (per execution)
Developer SDKs which expose easy-to-use bot functionalities available in both C# and Node.JS (e.g. Chat cards)

New bots can be either created via botframework.com or in the Azure Portal. When you create a bot an easy walkthrough is provided to pick a programming language (C# or Node.JS), set up a basic template for your desired functionality, connect to the appropriate chat platforms and provide connectors to enabled external providers (e.g. LUIS).

LUIS

The biggest reasons why bots are so powerful today is because of the newly trained language interpretation systems behind the bots. Microsoft's flavor for this is known as Language Understanding Intelligent Service (LUIS) and it serves as the interpretation system between the text input and the data processing used to provide an output that makes sense. The different capabilities are explained in the remainder of this section and illustrated in the figure below:

Intent

Whenever LUIS is queried with a sentence, it first tries to determine its intent. For example, if you have a GetFood intent to return food places, you want this functionality to trigger with a variety of sentences, like 'I am hungry', 'I want food', 'I am starving' etc. Completely different sentences as you can see, but because of the built-in language interpretation, LUIS knows there's a relation between starving/hungry/want food and LUIS will trigger the GetFood intent for all three sentences.

Entities

Another building block which LUIS provides are entities. Whenever a sentence is put through LUIS, it can extract certain keywords in these sentences and pass them to the BotFramework as separate entities. Think of the earlier example to get food. When a user for example asks 'I want pizza' or 'I am looking for a restaurant that serves orange juice', LUIS can be trained to detect food entities and extract relatively pizza and orange juice from these two sentences.

Phrase lists

Lastly, phrase lists in LUIS are ideal to link certain keywords to each other. Different food entities as described in the previous section, can be easily linked together with phrase lists so you don't have to train LUIS to detect every type of food. Instead, you can train LUIS to recognize e.g. pizza, and add a comma-seperated food phrase list with all the different types of food you wish to detect.

Code collaboration

Since we only had two days to finish the bot, we didn't organize a traditional hackathon where each team competes with each other for the best idea. We decided to collaborate with each other and to split the group up into different teams for different features. We also had an additional team which went out on the street to ask the festival attendees what functionalities they would be looking for when talking with the festival bot.

Even though every team worked on different functionalities, every functionality got merged into the same GitHub repository. After 30 hours, +300 commits and a lot of last minute bug fixing we managed to create our chat bot: Sonic.

To experience the bot yourself, you can try out the bot on Facebook Messenger. Or have a look at the video below showing most of its functionalities:

My journey from AWS to Azure

Bart Jansen — Mon, 12 Dec 2016 13:13:00 GMT

Before joining Microsoft a little over three months ago, I was not familiar with the Azure Cloud whatsoever and I replaced my Windows machine with an Apple MacBook several years ago without ever looking back. I have been a software engineer for the last 10 years, but when working in the cloud I have always developed in Amazon's AWS Cloud while working at various startups.

When I signed up to become a Technical Evangelist at Microsoft I knew this was about to change. I spent the last month and a half onboarding to Microsoft's Windows OS and Azure Cloud. Five years ago, I wouldn't have even considered joining MSFT, but due to Microsoft's new vision, focus on Cloud productivity and innovation I'm proud to call myself a Technical Evangelist at Microsoft.

In this blog I will be sharing my Azure onboarding experiences while trying to stay unbiased ;)

Excitement Graph

I think my ongoing journey can be described in a simple excitement-over-time graph shown below which I will try to explain in further detail in the remainder of this blog:

First impression

When I first opened the Azure Portal, the first thing I noticed is the extensive amount of options, items and resources I could pick from which felt really overwhelming.

AWS focuses on getting everything done via the Command Line Interface (CLI), whereas Azure tends to put more focus on its UI-based Web Portal. Even though the latter is nice for less experienced developers, it takes some getting used to.

Luckily Azure fully functions via the Command Line Interface as well (azure-cli), but does not seem to promote it in the way that Amazon does. Every tutorial/webinar/blog seems to use the Portal-approach and guide you through their code with various Portal screenshots. This is why it actually took me a couple of days before I found out there was a CLI :)

Every AWS service has its Azure counterpart

Yes, Every. It just takes some getting used to since they both use different naming conventions. A useful chart that helped me out a lot can be found here. AWS' EC2 and S3 can be found in Azure under respectively Virtual Machines and Azure Storage. And even more specific functionality, e.g. serverless computing where AWS uses Lambda can be found in Azure under Function Apps.

Microsoft embraces Open Source Software

The biggest concern I had when I joined Microsoft was that I had no .NET/C# experience whatsoever. A language and environment that have been going hand-in-hand with the Microsoft ecosystem. Fortunately, this does not hold me back in my productivity and programming capabilities at all in these days at Microsoft. The time where you could only run your code in sandboxed Windows Machines supporting solely ASP.NET with SQL server is over. Microsoft embraces every kind of programming and honestly does not care whether you are running a .NET application with SQL or a NodeJS application with MongoDB as a backend.

Historically, the path that Microsoft took is rather surprising. Especially when we look at a quote of Microsoft's former CEO Steve Ballmer in 2001:

"Linux is a cancer that attaches itself in an intellectual property sense to everything it touches"

Times sure have changed and even Ballmer seems to agree. At the time, Microsoft was fighting against the open source Linux community but as of lately actually embraces the whole OSS scene. Being the largest open source contributor on GitHub, SQL server running on Linux, Ubuntu running on Windows 10 and the list goes on. And it does result in some crazy setups which were unthinkable a couple of years ago:

Of course there are still scenarios where you wish you grew up in the MSFT ecosystem. One of the interesting companies that Microsoft acquired is Xamarin. Xamarin allows you to build cross platform mobile applications written in one programming language (instead of 3 different ones). Unfortunately for me, this language is C# ;)

Amen to Azure SaaS

One thing that I was used to on Amazon's AWS cloud was setting up Virtual Machines to host my web services. Even though AWS fully supports PaaS services, I have never experimented with this. With Azure, the focus truly lies on PaaS and even SaaS and it's currently trying to win ground for IaaS solutions. That said, a frictionless migration is possible for all your VMs. Furthermore, Azure offers a lot of PaaS services, such as the App Service which automagically maintains, scales up/down and provides lots of insight for your web services.

Another great example of easy-to-use technology, are Microsoft Cognitive Services. This is an attempt to democratize the otherwise extremely complex AI and Machine Learning possibilities. Cognitive Services exposes simple APIs which developers can leverage to use extremely well-trained models to interpret & analyze images, audio and video. The nice thing about this, is that it does not require a hardcore developer to use these services. Anyone with little programming experience can leverage these models.

Some drawbacks

One of the things that bothered me a lot in the beginning is that the Azure Portal tends to be slow at times. I however later found out that this is because I am using an internal Microsoft subscriptions which is known to be less fast. Not just the interface felt slow, but I also feel that deploying new instances in Azure takes a bit longer than necessary imho. Deploying a cluster of Azure Container Services easily takes up to 20mins which I don't fully understand because all they need to do is copy over a bunch of images, right?

Another thing which I noticed is that Azure does suffer from occasional outages. Even though these outages usually only affect specific regions, I believe this isn't something that should still happen anno 2016. With all that said, I do certainly think that these two cons outweigh the pros. With the PaaS/SaaS solutions, the openness in which you can use any language/toolset and the large amount of capabilities that Azure brings, I can definitely say I'm hooked.

Final thoughts

One big aspect which I do not consider in this blog, is the money aspect of it all which can be a really decisive argument for choosing a Cloud to build on.

Even more important is the actual cloud performance you are getting. Even though certain performance indicators are advertised, I have not analyzed whether both AWS and Azure live up to these standards.