Category Archives: Tech Reviews

Amazon Q Developer review: Code completions, code chat, and AWS skills

Posted by on 24 June, 2024

This post was originally published on this site

When I reviewed Amazon CodeWhisperer, Google Bard, and GitHub Copilot in June of 2023, CodeWhisperer could generate code in an IDE and did security reviews, but it lacked a chat window and code explanations. The current version of CodeWhisperer is now called Amazon Q Developer, and it does have a chat window that can explain code, and several other features that may be relevant to you, especially if you do a lot of development using AWS.

Amazon Q Developer currently runs in Visual Studio Code, Visual Studio, JetBrains IDEs, the Amazon Console, and the macOS command line. Q Developer also offers asynchronous agents, programming language translations, and Java code transformations/upgrades. In addition to generating, completing, and discussing code, Q Developer can write unit tests, optimize code, scan for vulnerabilities, and suggest remediations. It supports conversations in English, and code in the Python, Java, JavaScript, TypeScript, C#, Go, Rust, PHP, Ruby, Kotlin, C, C++, shell scripting, SQL, and Scala programming languages.

You can chat with Amazon Q Developer about AWS capabilities, and ask it to review your resources, analyze your bill, or architect solutions. It knows about AWS well-architected patterns, documentation, and solution implementation.

According to Amazon, Amazon Q Developer is “powered by Amazon Bedrock” and trained on “high-quality AWS content.” Since Bedrock supports many foundation models, it’s not clear from the web statement which one was used for Amazon Q Developer. I asked, and got this answer from an AWS spokesperson: “Amazon Q uses multiple models to execute its tasks and uses logic to route tasks to the model that is the best fit for the job.”

Amazon Q Developer has a reference tracker that detects whether a code suggestion might be similar to publicly available code. The reference tracker can label these with a repository URL and project license information, or optionally filter them out.

Amazon Q Developer directly competes with GitHub Copilot, JetBrains AI, and Tabnine, and indirectly competes with a number of large language models (LLMs) and small language models (SLMs) that know about code, such as Code Llama, StarCoder, Bard, OpenAI Codex, and Mistral Codestral. GitHub Copilot can converse in dozens of natural languages, as opposed to Amazon Q Developer’s one, and supports a number of extensions from programming, cloud, and database vendors, as opposed to Amazon Q Developer’s AWS-only ties.

Installing Amazon Q Developer

Given the multiple environments in which Amazon Q Developer can run, it’s not a surprise that there are multiple installers. The only tricky bit is signing and authentication.

Installing Q Developer in Visual Studio Code

You can install Amazon Q Developer from the Visual Studio Code Marketplace, or from the Extensions sidebar in Visual Studio Code. You can get to that sidebar from the Extensions icon at the far left, by pressing Shift-Command-X, or by choosing Extensions: Install Extensions from the command palette. Type “Amazon Q” to find it. Once you’ve installed the extension, you’ll need to authenticate to AWS as discussed below.

amazon q developer 01 IDG

Amazon Q Developer in Visual Studio Code includes a chat window (at the left) as well as code generation. The chat window is showing Amazon Q Developer’s capabilities.

Installing Q Developer in JetBrains IDEs

Like Visual Studio Code, JetBrains has a marketplace for IDE plugins, where Amazon Q Developer is available. You’ll need to reboot the IDE after downloading and installing the plugin. Then you’ll need to authenticate to AWS as discussed below. Note that the Amazon Q Developer plugin disables local inline JetBrains full-line code completion.

amazon q developer 02 IDG

Amazon Q Developer in IntelliJ IDEA, and other JetBrains IDEs, has a chat window on the right as well as code completion. The chat window is showing Amazon Q Developer’s capabilities.

Installing Q Developer in the AWS Toolkit for Visual Studio

For Visual Studio, Amazon Q Developer is part of the AWS Toolkit, which you can find it in the Visual Studio Marketplace. Again, once you’ve installed the toolkit you’ll need to authenticate to AWS as discussed below.

Signing and authenticating Amazon Q Developer

The authentication process is confusing because there are several options and several steps that bounce between your IDE and web browser. You used to have to repeat this process frequently, but the product manager assures me that re-authentication should now only be necessary every three months.

Installing Q Developer for command line

Amazon Q Developer for the command line is currently for macOS only, although a Linux version is on the roadmap and documented as a remote target. The macOS installation is basically a download of a DMG file, followed by running the disk image, dragging the Q file to the applications directory, and running that Q app to install the CLI q program and a menu bar icon that can bring up settings and the web user guide. You’ll also need to authenticate to AWS, which will log you in.

amazon q developer 03IDG

On macOS, the command-line program q supports multiple shell programs and multiple terminal programs. Here I’m using iTerm2 and the z shell. The q translate command constructs shell commands for you, and the q chat command opens an AI assistant.

Amazon Q Developer in the AWS Console

If you are running as an IAM user rather than a root user, you’ll have to add IAM permissions to use Amazon Q Developer. Once you have permission, AWS should display an icon at the right of the screen that brings up the Amazon Q Developer interface.

amazon q developer 04 IDG

The Amazon Q Developer window at the right, running in the AWS Console, can chat with you about using AWS and can generate architectures and code for AWS applications.

Evaluating Amazon Q Developer

According to AWS, “Amazon Q Developer Agent achieved the highest scores of 13.4% on the SWE-Bench Leaderboard and 20.5% on the SWE-Bench Leaderboard (Lite), a data set that benchmarks coding capabilities. Amazon Q security scanning capabilities outperform all publicly benchmarkable tools on detection across the most popular programming languages.”

Both of the quoted numbers are reflected on the SWE-Bench site, but there are two issues. Neither number has as yet been verified by SWE-Bench, and the Amazon Q Developer ranking on the Lite Leaderboard has dropped to #3. In addition, if there’s a supporting document on the web for Amazon’s security scanning claim, it has evaded my searches.

SWE-Bench, from Cornell, is “an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories.” The scores reflect the solution rates. The Lite data set is a subset of 300 GitHub issues.

Let’s explore how Amazon Q Developer behaves on the various tasks it supports in some of the 15 programming languages it supports. This is not a formal benchmark, but rather an attempt to get a feel for how well it works. Bear in mind that Amazon Q Developer is context sensitive and tries to use the persona that it thinks best fits the environment where you ask it for help.

Predictive inline code generation with Amazon Q Developer

I tried a softball question for predictive code generation and used one of Amazon’s inline suggestion examples. The Python prompt supplied was # Function to upload a file to an S3 bucket. Pressing Option-C as instructed got me the code below the prompt in the screenshot below, after an illegal character that I had to delete. I had to type import at the top to prompt Amazon Q to generate the imports for logging, boto3, and ClientError.

I also used Q Chat to tell me how to resolve the imports; it suggested a pip command, but on my system that fixed the wrong Python environment (v 3.11). I had to do a little sleuthing in the Frameworks directory tree to remind myself to use pip3 to target my current Python v 3.12 environment. I felt like singing “Daisy, Daisy” to Dave and complaining that my mind was going.

amazon q developer 05 IDG

Inline code generation and chat with Amazon Q Developer. All the code below the # TODO comment was generated by Amazon Q Developer, although it took multiple steps.

I also tried Amazon’s two other built-in inline suggestion examples. The example to complete an array of fake users in Python mostly worked; I had to add the closing ] myself. The example to generate unit tests failed when I pressed Option-C: It generated illegal characters instead of function calls. (I’m starting to suspect an issue with Option-C in VS Code on macOS. It may or may not have anything to do with Amazon Q Developer.)

When I restarted VS Code, tried again, and this time pressed Return on the line below the comment, it worked fine, generating the test_sum function below.

# Write a test case for the above function.
def test_sum():
    """
    Unit test for the sum function.
    """
    assert sum(1, 2) == 3
    assert sum(-1, 2) == 1
    assert sum(0, 0) == 0

AWS shows examples of completion with Amazon Q Developer in up to half a dozen programming languages in its documentation. The examples, like the Python ones we’ve discussed, are either very simple, e.g. add two numbers, or relate to common AWS operations supported by APIs, such as uploading files to an S3 bucket.

Natural language to code generation with Amazon Q Developer

Since I now believed that Amazon Q Developer can generate Python, especially for its own test examples, I tried something a little different. As shown in the screenshot below, I created a file called quicksort.cpp, then typed an initial comment:

//function to sort a vector of generics in memory using the quicksort algorithm

Amazon Q Developer kept trying to autocomplete this comment, and in some cases the implementation as well, for different problems. Nevertheless it was easy to keep typing my specification while Amazon Q Developer erased what it had generated, and Amazon Q Developer eventually generated a nearly correct implementation.

Quicksort is a well-known algorithm. Both the C and C++ libraries have implementations of it, but they don’t use generics. Instead, you need to write type-specific comparison functions to pass to qsort. That’s historic, as the libraries were implemented before generics were added to the languages.

1

2



Page 2

I eventually got Amazon Q Developer to generate the main routine to test the implementation. It initially generated documentation for the function instead, but when I rejected that and tried again it generated the main function with a test case.

Unsurprisingly, the generated code didn’t even compile the first time. I saw that Amazon Q Developer had left out the required #include <iostream>, but I let VS Code correct that error without sending any code to Amazon Q Developer or entering the #include myself.

It still didn’t compile. The errors were in the recursive calls to sortVector(), which were written in a style that tried to be too clever. I highlighted and sent one of the error messages to Amazon Q Developer for a fix, and it solved a different problem. I tried again, giving Amazon Q Developer more context and asking for a fix; this time it recognized the actual problem and generated correct code.

This experience was a lot like pair programming with an intern or a junior developer who hadn’t learned much C++. An experienced C/C++ programmer might have asked to recast the problem to use the qsort library function, on grounds of using the language library. I would have justified my specification to use generics on stylistic grounds as well as possible runtime efficiency grounds.

Another consideration here is that there’s a well-known worst case for qsort, which takes a maximum time to run when the vector to be sorted is already in order. For this implementation, there’s a simple fix to be made by randomizing the partition point (see Knuth, The Art of Computer Programming: Sorting and Searching, Volume 3). If you use the library function you just have to live with the inefficiency.

amazon q developer 06 IDG

Amazon Q Developer code generation from natural language to C++. I asked for a well-known sorting algorithm, quicksort, and complicated the problem slightly by specifying that the function operate on a vector of generics. It took several fixes, but got there eventually.

Code references from Amazon Q Developer

So far, none of my experiments with Amazon Q Developer have generated code references, which are associated with recommendations that are similar to training data. I do see a code reference log in Visual Studio Code, but it currently just says “Don’t want suggestions that include code with references? Uncheck this option in Amazon Q: Settings.”

Vulnerability detection with Amazon Q Developer 

By default, Q Developer scans your open code files for vulnerabilities in the background, and generates squiggly underlines when it finds them. From there you can bring up explanations of the vulns and often invoke automatic fixes for them. You can also ask Q to scan your whole project for vulnerabilities and generate a report. Scans look for security issues such as as resource leaks, SQL injection, and cross-site scripting; secrets such as hardcoded passwords, database connection strings, and usernames; misconfiguration, compliance, and security issues in infrastructure as code files; and deviations from quality and efficiency best practices.

Q Chat in Amazon Q Developer

You’ve already seen how you can use Q Chat in an IDE to explain and fix code. It can also optimize code and write unit tests. You can go back to the first screenshot in this review to see Q Chat’s summary of what it can and can’t do, or use the /help command yourself once you have Q Chat set up in your IDE. On the whole, having Q Chat in Amazon Q Developer improves the product considerably over last year’s CodeWhisperer.

Customization in Amazon Q Developer 

If you set up Amazon Q Developer at the Pro level, you can customize its code generation of Python, Java, JavaScript, and TypeScript by giving it access to your code base. The code base can be in an S3 bucket or in a repository on GitHub, GitLab, or Bitbucket.

Running a customization generates a fine-tuned model that your users can choose to use for their code suggestions. They’ll still be able to use the default base model, but companies have reported that using customized code generation increases developer productivity even more than using the base model.

Developer agents in Amazon Q Developer 

Developer agents are long-running Amazon Q Developer processes. The one agent I’ve seen so far is for code transformation, specifically transforming Java 8 or Java 11 Maven projects to Java 17. There are a bunch of specific requirements your Java project needs to meet for a successful transformation, but the transformation agent worked well in AWS’s internal tests. While I have seen it demonstrated, I haven’t run it myself.

Amazon Q Developer in command line

Amazon Q Developer for the CLI currently (v 1.2.0) works in macOS; supports the bash, zsh, and fish shells; runs in the iTerm2, macOS Terminal, Hyper, Alacritty, Kitty, and wezTerm terminal emulators; runs in the VS Code terminal and JetBrains terminals (except Fleet); and supports some 500 of the most popular CLIs such as git, aws, docker, npm, and yarn. You can extend the CLI to remote macOS systems with q integrations install ssh. You can also extend it to 64-bit versions of recent distributions of Fedora, Ubuntu, and Amazon Linux 2023. (That one’s not simple, but it’s documented.)

Amazon Q Developer CLI performs three major services. It can autocomplete your commands as you type, it can translate natural language specifications to CLI commands (q translate), and it can chat with you about how to perform tasks from the command line (q chat).

For example, I often have trouble remembering all the steps it takes to rebase a Git repository, which is something you might want to do if you and a colleague are working on the same code (careful!) on different branches (whew!). I asked q chat, “How can I rebase a git repo?”

It gave me the response in the first screenshot below. To get brushed up on how the action works, I asked the follow-up question, “What does rebasing really mean?” It gave me the response in the second screenshot below. Finally, to clarify the reasons why I would rebase my feature branch versus merging it with an updated branch, I asked, “Why rebase a repo instead of merging branches?” It gave me the response in the third screenshot below.

The simple answer to the question I meant to ask is item 2, which talks about the common case where the main branch is changing while you work on a feature. The real, overarching answer is at the end: “The decision to rebase or merge often comes down to personal preference and the specific needs of your project and team. It’s a good idea to discuss your team’s Git workflow and agree on when to use each approach.”

amazon q developer 789 IDG

In the first screenshot above, I asked q chat, “How can I rebase a git repo?” In the second screenshot, I asked “What does rebasing really mean?” In the third, I asked “Why rebase a repo instead of merging branches?”

Amazon Q Developer in AWS Console

As you saw earlier in this review, a small Q icon at the upper right of the AWS Management Console window brings up a right-hand column where Amazon Q Developer invites you to “Ask me anything about AWS.” Similarly a large Q icon at the bottom right of an AWS documentation page brings up that same AMAaA column as a modeless floating window.

Recommended for experienced programmers

Overall, I like Amazon Q Developer. It seems to be able to handle the use cases for which it was trained, and generate whole functions in common programming languages with only a few fixes. It can be useful for completing lines of code, doc strings, and if/for/while/try code blocks as you type. It’s also nice that it scans for vulnerabilities and can help you fix code problems.

On the other hand, Q Developer can’t generate full functions for some use cases; it then reverts to line-by-line suggestions. Also, there seems to be a bug associated with the use of Option-C to trigger code generation. I hope that will be fixed fairly soon, but the workaround is to press Return a lot.

According to Amazon, a 33% acceptance rate is par for the course for AI code generators. By acceptance rate, they mean the percentage of generated code that is used by the programmer. They claim a higher rate than that, even for their base model without customization. They also claim over 50% boosts in programmer productivity, although how they measure programmer productivity isn’t clear to me.

Their claim is that customizing the Amazon Q Developer model to “the way we do things here” from the company’s code base offers an additional boost in acceptance rate and programmer productivity. Note that code bases need to be cleaned up before using them for training. You don’t want the model learning bad, obsolete, or unsafe coding habits.

I can believe a hefty productivity boost for experienced developers from using Amazon Q Developer. However, I can’t in good conscience recommend that programming novices use any AI code generator until they have developed their own internal sense for how code should be written, validated, and tested. One of the ways that LLMs go off the rails is to start generating BS, also called hallucinating. If you can’t spot that, you shouldn’t rely on their output.

How does Amazon Q Developer compare to GitHub Copilot, JetBrains AI, and Tabnine? Stay tuned. I need to reexamine GitHub Copilot, which seems to get updates on a monthly basis, and take a good look at JetBrains AI and Tabnine before I can do that comparison properly. I’d bet good money, however, that they’ll all have changed in some significant way by the time I get through my full round of reviews.

Cost: Free with limited monthly access to advanced features; Pro tier $19/month.

Platform: Amazon Web Services. Supports Visual Studio Code, Visual Studio, JetBrains IDEs, the Amazon Console, and the macOS command line. Supports recent 64-bit Fedora, Ubuntu, and Amazon Linux 2023 as remote targets from macOS ssh.

Pros

  1. Works fairly well, especially for popular languages and AWS applications
  2. Basic version is free
  3. Supports Python, Java, JavaScript, TypeScript, C#, Go, Rust, PHP, Ruby, Kotlin, C, C++, shell scripting, SQL, and Scala programming languages
  4. Can chat as well as generate code

Cons

  1. Only converses in English
  2. No Windows CLI support

Next read this:

LlamaIndex review: Easy context-augmented LLM applications

Posted by on 17 June, 2024

This post was originally published on this site

“Turn your enterprise data into production-ready LLM applications,” blares the LlamaIndex home page in 60 point type. OK, then. The subhead for that is “LlamaIndex is the leading data framework for building LLM applications.” I’m not so sure that it’s the leading data framework, but I’d certainly agree that it’s a leading data framework for building with large language models, along with LangChain and Semantic Kernel, about which more later.

LlamaIndex currently offers two open source frameworks and a cloud. One framework is in Python; the other is in TypeScript. LlamaCloud (currently in private preview) offers storage, retrieval, links to data sources via LlamaHub, and a paid proprietary parsing service for complex documents, LlamaParse, which is also available as a stand-alone service.

LlamaIndex boasts strengths in loading data, storing and indexing your data, querying by orchestrating LLM workflows, and evaluating the performance of your LLM application. LlamaIndex integrates with over 40 vector stores, over 40 LLMs, and over 160 data sources. The LlamaIndex Python repository has over 30K stars.

Typical LlamaIndex applications perform Q&A, structured extraction, chat, or semantic search, and/or serve as agents. They may use retrieval-augmented generation (RAG) to ground LLMs with specific sources, often sources that weren’t included in the models’ original training.

LlamaIndex competes with LangChain, Semantic Kernel, and Haystack. Not all of these have exactly the same scope and capabilities, but as far as popularity goes, LangChain’s Python repository has over 80K stars, almost three times that of LlamaIndex (over 30K stars), while the much newer Semantic Kernel has over 18K stars, a little over half that of LlamaIndex, and Haystack’s repo has over 13K stars.

Repository age is relevant because stars accumulate over time; that’s also why I qualify the numbers with “over.” Stars on GitHub repos are loosely correlated with historical popularity.

LlamaIndex, LangChain, and Haystack all boast a number of major companies as users, some of whom use more than one of these frameworks. Semantic Kernel is from Microsoft, which doesn’t usually bother publicizing its users except for case studies.

llamaindex 01 IDG

The LlamaIndex framework helps you to connect data, embeddings, LLMs, vector databases, and evaluations into applications. These are used for Q&A, structured extraction, chat, semantic search, and agents.

LlamaIndex features

At a high level, LlamaIndex is designed to help you build context-augmented LLM applications, which basically means that you combine your own data with a large language model. Examples of context-augmented LLM applications include question-answering chatbots, document understanding and extraction, and autonomous agents.

The tools that LlamaIndex provides perform data loading, data indexing and storage, querying your data with LLMs, and evaluating the performance of your LLM applications:

  • Data connectors ingest your existing data from their native source and format.
  • Data indexes, also called embeddings, structure your data in intermediate representations.
  • Engines provide natural language access to your data. These include query engines for question answering, and chat engines for multi-message conversations about your data.
  • Agents are LLM-powered knowledge workers augmented by software tools.
  • Observability/Evaluation integrations enable you to experiment, evaluate, and monitor your app.

Context augmentation

LLMs have been trained on large bodies of text, but not necessarily text about your domain. There are three major ways to perform context augmentation and add information about your domain, supplying documents, doing RAG, and fine-tuning the model.

The simplest context augmentation method is to supply documents to the model along with your query, and for that you might not need LlamaIndex. Supplying documents works fine unless the total size of the documents is larger than the context window of the model you’re using, which was a common issue until recently. Now there are LLMs with million-token context windows, which allow you to avoid going on to the next steps for many tasks. If you plan to perform many queries against a million-token corpus, you’ll want to cache the documents, but that’s a subject for another time.

Retrieval-augmented generation combines context with LLMs at inference time, typically with a vector database. RAG procedures often use embedding to limit the length and improve the relevance of the retrieved context, which both gets around context window limits and increases the probability that the model will see the information it needs to answer your question.

Essentially, an embedding function takes a word or phrase and maps it to a vector of floating point numbers; these are typically stored in a database that supports a vector search index. The retrieval step then uses a semantic similarity search, often using the cosine of the angle between the query’s embedding and the stored vectors, to find “nearby” information to use in the augmented prompt.

Fine-tuning LLMs is a supervised learning process that involves adjusting the model’s parameters to a specific task. It’s done by training the model on a smaller, task-specific or domain-specific data set that’s labeled with examples relevant to the target task. Fine-tuning often takes hours or days using many server-level GPUs and requires hundreds or thousands of tagged exemplars.

Installing LlamaIndex

You can install the Python version of LlamaIndex three ways: from the source code in the GitHub repository, using the llama-index starter install, or using llama-index-core plus selected integrations. The starter installation would look like this:

pip install llama-index

This pulls in OpenAI LLMs and embeddings in addition to the LlamaIndex core. You’ll need to supply your OpenAI API key (see here) before you can run examples that use it. The LlamaIndex starter example is quite straightforward, essentially five lines of code after a couple of simple setup steps. There are many more examples in the repo, with documentation.

Doing the custom installation might look something like this:

pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface

That installs an interface to Ollama and Hugging Face embeddings. There’s a local starter example that goes with this installation. No matter which way you start, you can always add more interface modules with pip.

If you prefer to write your code in JavaScript or TypeScript, use LlamaIndex.TS (repo). One advantage of the TypeScript version is that you can run the examples online on StackBlitz without any local setup. You’ll still need to supply an OpenAI API key.

LlamaCloud and LlamaParse

LlamaCloud is a cloud service that allows you to upload, parse, and index documents and search them using LlamaIndex. It’s in a private alpha stage, and I was unable to get access to it. LlamaParse is a component of LlamaCloud that allows you to parse PDFs into structured data. It’s available via a REST API, a Python package, and a web UI. It is currently in a public beta. You can sign up to use LlamaParse for a small usage-based fee after the first 7K pages a week. The example given comparing LlamaParse and PyPDF for the Apple 10K filing is impressive, but I didn’t test this myself.

LlamaHub

LlamaHub gives you access to a large collection of integrations for LlamaIndex. These include agents, callbacks, data loaders, embeddings, and about 17 other categories. In general, the integrations are in the LlamaIndex repository, PyPI, and NPM, and can be loaded with pip install or npm install.

create-llama CLI

create-llama is a command-line tool that generates LlamaIndex applications. It’s a fast way to get started with LlamaIndex. The generated application has a Next.js powered front end and a choice of three back ends.

RAG CLI

RAG CLI is a command-line tool for chatting with an LLM about files you have saved locally on your computer. This is only one of many use cases for LlamaIndex, but it’s quite common.

LlamaIndex components

The LlamaIndex Component Guides give you specific help for the various parts of LlamaIndex. The first screenshot below shows the component guide menu. The second shows the component guide for prompts, scrolled to a section about customizing prompts.

llamaindex 02 IDG

The LlamaIndex component guides document the different pieces that make up the framework. There are quite a few components.

llamaindex 03 IDG

We’re looking at the usage patterns for prompts. This particular example shows how to customize a Q&A prompt to answer in the style of a Shakespeare play. This is a zero-shot prompt, since it doesn’t provide any exemplars.

Learning LlamaIndex

Once you’ve read, understood, and run the starter example in your preferred programming language (Python or TypeScript, I suggest that you read, understand, and try as many of the other examples as look interesting. The screenshot below shows the result of generating a file called essay by running essay.ts and then asking questions about it using chatEngine.ts. This is an example of using RAG for Q&A.

The chatEngine.ts program uses the ContextChatEngine, Document, Settings, and VectorStoreIndex components of LlamaIndex. When I looked at the source code, I saw that it relied on the OpenAI gpt-3.5-turbo-16k model; that may change over time. The VectorStoreIndex module seemed to be using the open-source, Rust-based Qdrant vector database, if I was reading the documentation correctly.

llamaindex 04 IDG

After setting up the terminal environment with my OpenAI key, I ran essay.ts to generate an essay file and chatEngine.ts to field queries about the essay.

Bringing context to LLMs

As you’ve seen, LlamaIndex is fairly easy to use to create LLM applications. I was able to test it against OpenAI LLMs and a file data source for a RAG Q&A application with no issues. As a reminder, LlamaIndex integrates with over 40 vector stores, over 40 LLMs, and over 160 data sources; it works for several use cases, including Q&A, structured extraction, chat, semantic search, and agents.

I’d suggest evaluating LlamaIndex along with LangChain, Semantic Kernel, and Haystack. It’s likely that one or more of them will meet your needs. I can’t recommend one over the others in a general way, as different applications have different requirements.

Pros

  1. Helps to create LLM applications for Q&A, structured extraction, chat, semantic search, and agents
  2. Supports Python and TypeScript
  3. Frameworks are free and open source
  4. Lots of examples and integrations

Cons

  1. Cloud is limited to private preview
  2. Marketing is slightly overblown

Cost

Open source: free. LlamaParse import service: 7K pages per week free, then $3 per 1000 pages.

Platform

Python and TypeScript, plus cloud SaaS (currently in private preview).

Next read this:

DuckDB: The tiny but powerful analytics database

Posted by on 15 May, 2024

This post was originally published on this site

Most people assume that analytical databases, or OLAPs, are big, powerful beasts—and they are correct. Systems like Snowflake, Redshift, or Postgres involve a lot of setup and maintenance, even in their cloud-hosted incarnations. But what if all you want is “just enough” analytics for a dataset on your desktop? In that case, DuckDB is worth exploring.

Columnar data analytics on your laptop

DuckDB is a tiny but powerful analytics database engine—a single, self-contained executable, which can run standalone or as a loadable library inside a host process. There’s very little you need to set up or maintain with DuckDB. In this way, it is more like SQLite than the bigger analytical databases in its class.

DuckDB is designed for column-oriented data querying. It ingests data from sources like CSV, JSON, and Apache Parquet, and enables fast querying using familiar SQL syntax. DuckDB supports libraries for all the major programming languages, so you can work with it programmatically using the language of your choice. Or you can use DuckDB’s command-line interface, either on its own or as part of a shell pipeline.

Loading data into DuckDB

When you work with data in DuckDB, there are two modes you can use for that data. Persistent mode writes the data to disk so it can handle workloads bigger than system memory. This approach comes at the cost of some speed. In-memory mode keeps the data set entirely in memory, which is faster but retains nothing once the program ends. (SQLite can be used the same way.)

DuckDB can ingest data from a variety of formats. CSV, JSON, and Apache Parquet files are three of the most common. With CSV and JSON, DuckDB by default attempts to figure out the columns and data types on its own, but you can override that process as needed—for instance, to specify a format for a date column.

Other databases, like MySQL or Postgres, can also be used as data sources. You’ll need to load a DuckDB extension (more on this later) and provide a connection string to the database server; DuckDB doesn’t read the files for those databases directly. With SQLite, though, you connect to the SQLite database file as though it were just another data file.

To load data into DuckDB from an external source, you can use an SQL string, passed directly into DuckDB:


SELECT * FROM read_csv('data.csv');

You can also use methods in the DuckDB interface library for a given language. With the Python library for DuckDB, ingesting looks like this:


import duckdb
duckdb.read_csv("data.csv")

You can also query certain file formats directly, like Parquet:


SELECT * FROM 'test.parquet';

You can also issue file queries to create a persistent data view, which is usable as a table for multiple queries:


CREATE VIEW test_data AS SELECT * FROM read_parquet('test.parquet');

DuckDB has optimizations for working with Parquet files, so that it reads only what it needs from the file.

Other interfaces like ADBC and ODBC can also be used. ODBC serves as a connector for data visualization tools like Tableau.

Data imported into DuckDB can also be re-exported in many common formats: CSV, JSON, Parquet, Microsoft Excel, and others. This makes DuckDB useful as a data-conversion tool in a processing pipeline.

Querying data in DuckDB

Once you’ve loaded data into DuckDB, you can query it using SQL expressions. The format for such expressions is no different from regular SQL queries:


SELECT * FROM users WHERE ID>1000 ORDER BY Name DESC LIMIT 5;

If you’re using a client API to query DuckDB, you can pass SQL strings through the API, or you can use the client’s relational API to build up queries programmatically. In Python, reading from a JSON file and querying it might look like this:


import duckdb
file = duckdb.read_json("users.json")
file.select("*").filter("ID>1000").order("Name").limit(5)

If you use Python, you can use the PySpark API to query DuckDB directly, although DuckDB’s implementation of PySpark doesn’t yet support the full feature set.

DuckDB’s dialect of SQL closely follows most common SQL dialects, although it comes with a few gratuitous additions for the sake of analytics. For instance, placing the SAMPLE clause in a query lets you run a query using only a subset of the data in a table. The resulting query runs faster but it may be less accurate. DuckDB also supports the PIVOT keyword (for creating pivot tables), window functions and QUALIFY clauses to filter them, and many other analytics functions in its SQL dialect.

DuckDB extensions

DuckDB isn’t limited to the data formats and behaviors baked into it. Its extension API makes it possible to write third-party add-ons for DuckDB to support new data formats or other behaviors.

Some of the functionality included with DuckDB is implemented through first-party add-ons, like support for Parquet files. Others, like MySQL or Postgres connectivity, or vector similarity search, are also maintained by DuckDB’s team but provided separately.

Next read this:

DBOS: A better way to build applications?

Posted by on 29 April, 2024

This post was originally published on this site

At the end of March 2024, Mike Stonebraker announced in a blog post the release of DBOS Cloud, “a transactional serverless computing platform, made possible by a revolutionary new operating system, DBOS, that implements OS services on top of a distributed database.” That sounds odd, to put it mildly, but it makes more sense when you read the origin story:

The idea for DBOS (DataBase oriented Operating System) originated 3 years ago with my realization that the state an operating system must maintain (files, processes, threads, messages, etc.) has increased in size by about 6 orders of magnitude since I began using Unix on a PDP-11/40 in 1973. As such, storing OS state is a database problem. Also, Linux is legacy code at the present time and is having difficulty making forward progress. For example there is no multi-node version of Linux, requiring people to run an orchestrator such as Kubernetes. When I heard a talk by Matei Zaharia in which he said Databricks could not use traditional OS scheduling technology at the scale they were running and had turned to a DBMS solution instead, it was clear that it was time to move the DBMS into the kernel and build a new operating system.”

If you don’t know Stonebraker, he’s been a database-focused computer scientist (and professor) since the early 1970s, when he and his UC Berkeley colleagues Eugene Wong and Larry Rowe founded Ingres. Ingres later inspired Sybase, which was eventually the basis for Microsoft SQL Server. After selling Ingres to Computer Associates, Stonebraker and Rowe started researching Postgres, which later became PostgreSQL and also evolved into Illustra, which was purchased by Informix.

I heard Stonebraker talk about Postgres at a DBMS conference in 1980. What I got out of that talk, aside from an image of “jungle drums” calling for SQL, was the idea that you could add support for complex data types to the database by implementing new index types, extending the query language, and adding support for that to the query parser and optimizer. The example he used was geospatial information, and he explained one kind of index structure that would make 2D geometric database queries go very fast. (This facility eventually became PostGIS. The R-tree currently used by default in PostGIS GiST indexes wasn’t invented until 1984, so Mike was probably talking about the older quadtree index.)

Skipping ahead 44 years, it should surprise precisely nobody in the database field that DBOS uses a distributed version of PostgreSQL as its kernel database layer.

dbos 01 IDG

The DBOS system diagram makes it clear that a database is part of the OS kernel. The distributed database relies on a minimal kernel, but sits under the OS services instead of running in the application layer as a normal database would.

DBOS features

DBOS Transact, an open-source TypeScript framework, supports Postgres-compatible transactions, reliable workflow orchestration, HTTP serving using GET and POST, communication with external services and third-party APIs, idempotent requests using UUID keys, authentication and authorization, Kafka integration with exactly-once semantics, unit testing, and self-hosting. DBOS Cloud, a transactional serverless platform for deploying DBOS Transact applications, supports serverless app deployment, time-travel debugging, cloud database management, and observability.

Let’s highlight some major areas of interest.

DBOS Transact

The code shown in the screenshot below demonstrates transactions, as well as HTTP serving using GET. It’s worthwhile to read the code closely. It’s only 18 lines, not counting blank lines.

The first import (line 1) brings in the DBOS SDK classes that we’ll need. The second import (line 2) brings in the Knex.js SQL query builder, which handles sending the parameterized query to the Postgres database and returning the resulting rows. The database table schema is defined in lines 4 through 8; the only columns are a name string and a greet_count integer.

There is only one method in the Hello class, helloTransaction. It is wrapped in @GetApi and @Transaction decorators, which respectively cause the method to be served in response to an HTTP GET request on the path /greeting/ followed by the username parameter you want to pass in and wrap the database call in a transaction, so that two instances can’t update the database simultaneously.

The database query string (line 16) uses PostgreSQL syntax to try to insert a row into the database for the supplied name and an initial count of 1. If the row already exists, then the ON CONFLICT trigger runs an update operation that increments the count in the database.

Line 17 uses Knex.js to send the SQL query to the DBOS system database and retrieves the result. Line 18 pulls the count out of the first row of results and returns the greeting string to the calling program.

The use of SQL and a database for what feels like should be a core in-memory system API, such as a Linux atomic counter or a Windows interlocked variable, seems deeply weird. Nevertheless, it works.

dbos 02 IDG

This TypeScript code for a Hello class is generated when you perform a DBOS create operation. As you can see, it relies on the @GetApi and @Transaction decorators to serve the function from HTTP GET requests and run the function as a database transaction.

DBOS Time Travel Debugger

When you run an application in DBOS Cloud it records every step and change it makes (the workflow) in the database. You can debug that using Visual Studio Code and the DBOS Time Travel Debugger extension. The time-travel debugger allows you to debug your DBOS application against the database as it existed at the time the selected workflow originally executed.

dbos 03IDG

To perform time-travel debugging, you first start with a CodeLens to list saved trace workflows. Once you choose the one you want, you can debug it using Visual Studio Code with a plugin, or from the command line.

dbos 04IDG

Time-travel debugging with a saved workflow looks very much like ordinary debugging in Visual Studio Code. The code being debugged is the same Hello class you saw earlier. 

DBOS Quickstart

The DBOS Quickstart tutorial requires Node.js 20 or later and a PostgreSQL database you can connect to, either locally, in a Docker container, or remotely. I already had Node.js v20.9.0 installed on my M1 MacBook, but I upgraded it to v20.12.1 from the Node.js website.

I didn’t have PostgreSQL installed, so I downloaded and ran the interactive installer for v16.2 from EnterpriseDB. This installer creates a full-blown macOS server and applications. If I had used Homebrew instead, it would have created command-line applications, and if I had used Postgres.app, I would have gotten a menu-bar app.

The Quickstart proper starts by creating a DBOS app directory using Node.js.

martinheller@Martins-M1-MBP ~ % npx -y @dbos-inc/create@latest -n myapp
Merged .gitignore files saved to myapp/.gitignore
added 590 packages, and audited 591 packages in 25s
found 0 vulnerabilities
added 1 package, and audited 592 packages in 1s
found 0 vulnerabilities
added 129 packages, and audited 721 packages in 5s
found 0 vulnerabilities
Application initialized successfully!

Then you configure the app to use your Postgres server and export your Postgres password into an enviroment variable.

martinheller@Martins-M1-MBP ~ % cd myapp
martinheller@Martins-M1-MBP myapp % npx dbos configure
? What is the hostname of your Postgres server? localhost
? What is the port of your Postgres server? 5432
? What is your Postgres username? postgres
martinheller@Martins-M1-MBP myapp % export PGPASSWORD=*********

After that, you create a “Hello” database using Node.js and Knex.js.

martinheller@Martins-M1-MBP myapp % npx dbos migrate
2024-04-09 15:01:42 [info]: Starting migration: creating database hello if it does not exist
2024-04-09 15:01:42 [info]: Database hello does not exist, creating...
2024-04-09 15:01:42 [info]: Executing migration command: npx knex migrate:latest
2024-04-09 15:01:43 [info]: Batch 1 run: 1 migrations
2024-04-09 15:01:43 [info]: Creating DBOS tables and system database.
2024-04-09 15:01:43 [info]: Migration successful!

With that complete, you build and run the DBOS app locally.

martinheller@Martins-M1-MBP myapp % npm run build
npx dbos start

> myapp@0.0.1 build
> tsc

2024-04-09 15:02:30 [info]: Workflow executor initialized
2024-04-09 15:02:30 [info]: HTTP endpoints supported:
2024-04-09 15:02:30 [info]:     GET   :  /greeting/:user
2024-04-09 15:02:30 [info]: DBOS Server is running at http://localhost:3000
2024-04-09 15:02:30 [info]: DBOS Admin Server is running at http://localhost:3001
^C

At this point, you can browse to http://localhost:3000 to test the application. That done, you register for the DBOS Cloud and provision your own database there.

martinheller@Martins-M1-MBP myapp % npx dbos-cloud register -u meheller
2024-04-09 15:11:35 [info]: Welcome to DBOS Cloud!
2024-04-09 15:11:35 [info]: Before creating an account, please tell us a bit about yourself!
Enter First/Given Name: Martin
Enter Last/Family Name: Heller
Enter Company: self
2024-04-09 15:12:06 [info]: Please authenticate with DBOS Cloud!
Login URL: https://login.dbos.dev/activate?user_code=QWKW-TXTB
2024-04-09 15:12:12 [info]: Waiting for login...
2024-04-09 15:12:17 [info]: Waiting for login...
2024-04-09 15:12:22 [info]: Waiting for login...
2024-04-09 15:12:27 [info]: Waiting for login...
2024-04-09 15:12:32 [info]: Waiting for login...
2024-04-09 15:12:38 [info]: Waiting for login...
2024-04-09 15:12:44 [info]: meheller successfully registered!
martinheller@Martins-M1-MBP myapp % npx dbos-cloud db provision iw_db -U meheller
Database Password: ********
2024-04-09 15:19:22 [info]: Successfully started provisioning database: iw_db
2024-04-09 15:19:28 [info]: {"PostgresInstanceName":"iw_db","HostName":"userdb-51fcc211-6ed3-4450-a90e-0f864fc1066c.cvc4gmaa6qm9.us-east-1.rds.amazonaws.com","Status":"available","Port":5432,"DatabaseUsername":"meheller","AdminUsername":"meheller"}
2024-04-09 15:19:28 [info]: Database successfully provisioned!

Finally, you can register and deploy your app in the DBOS Cloud.

martinheller@Martins-M1-MBP myapp % npx dbos-cloud app register -d iw_db
2024-04-09 15:20:09 [info]: Loaded application name from package.json: myapp
2024-04-09 15:20:09 [info]: Registering application: myapp
2024-04-09 15:20:11 [info]: myapp ID: d8806829-c5b8-4df0-8b5a-2d1bf87c3322
2024-04-09 15:20:11 [info]: Successfully registered myapp!
martinheller@Martins-M1-MBP myapp % npx dbos-cloud app deploy
2024-04-09 15:20:35 [info]: Loaded application name from package.json: myapp
2024-04-09 15:20:35 [info]: Submitting deploy request for myapp
2024-04-09 15:21:09 [info]: Submitted deploy request for myapp. Assigned version: 1712676035
2024-04-09 15:21:13 [info]: Waiting for myapp with version 1712676035 to be available
2024-04-09 15:21:21 [info]: Successfully deployed myapp!
2024-04-09 15:21:21 [info]: Access your application at https://meheller-myapp.cloud.dbos.dev/
dbos 05IDG

The “Hello” application running in the DBOS Cloud counts every greeting. It uses the code you saw earlier.

DBOS applications

The “Hello” application does illustrate some of the core features of DBOS Transact and the DBOS Cloud, but it’s so basic that it’s barely a toy. The Programming Quickstart adds a few more details, and it’s worth your time to go through it. You’ll learn how to use communicator functions to access third-party services (email, in this example) as well as how to compose reliable workflows. You’ll literally interrupt the workflow and restart it without re-sending the email: DBOS workflows always run to completion and each of their operations executes once and only once. That’s possible because DBOS persists the output of each step in your database.

Once you’ve understood the programming Quickstart, you’ll be ready to try out the two DBOS demo applications, which do rise to the level of being toys. Both demos use Next.js for their front ends, and both use DBOS workflows, transactions, and communicators.

The first demo, E-Commerce, is a web shopping and payment processing system. It’s worthwhile reading the Under the Covers section of the README in the demo’s repository to understand how it works and how you might want to upgrade it to, for example, use a real-world payment provider.

The second demo, YKY Social, simulates a simple social network, and uses TypeORM rather than Knex.js for its database code. It also uses Amazon S3 for profile photos. If you’re serious about using DBOS yourself, you should work though both demo applications.

A tantalizing glimpse

I have to say that DBOS and DBOS Cloud look very interesting. Reliable execution and time-travel debugging, for example, are quite desirable. On the other hand, I wouldn’t want to build a real application on DBOS or DBOS Cloud at this point. I have lots of questions, starting with “How does it scale in practice?” and probably ending with “How much will it cost at X scale?”

I mentioned earlier that DBOS code looks weird but works. I would imagine that any programming shop considering writing an application on it would be discouraged or even repelled by the “it looks weird” part, as developers tend to be set in their ways until what they are doing no longer works.

I also have to point out that the current implementation of DBOS is very far from the system diagram you saw near the beginning of this review. Where’s the minimal kernel? DBOS currently runs on macOS, Linux, and Windows. None of those are minimal kernels. DBOS Cloud currently runs on AWS. Again, not a minimal kernel.

So, overall, DBOS is a tantalizing glimpse of something that may eventually turn out to be cool. It’s new and shiny, and it comes from smart people, but it will be awhile before it could possibly become a mainstream system.

Cost: Free with usage limits; paid plans require you to contact sales.

Platform: macOS, Linux, Windows, AWS.

Next read this:

Google Vertex AI Studio puts the promise in generative AI

Posted by on 26 March, 2024

This post was originally published on this site

Vertex AI Studio is an online environment for building AI apps, featuring Gemini, Google’s own multimodal generative AI model that can work with text, code, audio, images, and video. In addition to Gemini, Vertex AI provides access to more than 40 proprietary models and more than 60 open source models in its Model Garden, for example the proprietary PaLM 2, Imagen, and Codey models from Google Research, open source models like Llama 2 from Meta, and Claude 2 and Claude 3 from Anthropic. Vertex AI also offers pre-trained APIs for speech, natural language, translation, and vision.

Vertex AI supports prompt engineering, hyper-parameter tuning, retrieval-augmented generation (RAG), and model tuning. You can tune foundation models with your own data, using tuning options such as adapter tuning and reinforcement learning from human feedback (RLHF), or perform style and subject tuning for image generation.

Vertex AI Extensions connect models to real-world data and real-time actions. Vertex AI allows you to work with models both in the Google Cloud console and via APIs in Python, Node.js, Java, and Go.

Competitive products include Amazon Bedrock, Azure AI Studio, LangChain/LangSmith, LlamaIndex, Poe, and the ChatGPT GPT Builder. The technical levels, scope, and programming language support of these products vary.

Vertex AI Studio

Vertex AI Studio is a Google Cloud console tool for building and testing generative AI models. It allows you to design and test prompts and customize foundation models to meet your application’s needs.

Foundation models are another term for the generative AI models found in Vertex AI. Calling them foundation models emphasizes the fact that they can be customized with your data for the specialized purposes of your application. They can generate text, chat, image, code, video, multimodal data, and embeddings.

Embeddings are vector representations of other data, for example text. Search engines often use vector embeddings, a cosine metric, and a nearest-neighbor algorithm to find text that is relevant (similar) to a query string.

The proprietary Google generative AI models available in Vertex AI include:

  • Gemini API: Advanced reasoning, multi-turn chat, code generation, and multimodal prompts.
  • PaLM API: Natural language tasks, text embeddings, and multi-turn chat.
  • Codey APIs: Code generation, code completion, and code chat.
  • Imagen API: Image generation, image editing, and visual captioning.
  • MedLM: Medical question answering and summarization (private GA).

Vertex AI Studio allows you to test models using prompt samples. The prompt galleries are organized by the type of model (multimodal, text, vision, or speech) and the task being demonstrated, for example “summarize key insights from a financial report table” (text) or “read the text from this handwritten note image” (multimodal).

Vertex AI also helps you to design and save your own prompts. The types of prompt are broken down by purpose, for example text generation versus code generation and single-shot versus chat. Iterating on your prompts is a surprisingly powerful way of customizing a model to produce the output you want, as we’ll discuss below.

When prompt engineering isn’t enough to coax a model into producing the desired output, and you have a training data set in a suitable format, you can take the next step and tune a foundation model in one of several ways: supervised tuning, RLHF tuning, or distillation. Again, we’ll discuss this in more detail later on in this review.

The Vertex AI Studio speech tool can convert speech to text and text to speech. For text to speech you can choose your preferred voice and control its speed. For speech to text, Vertex AI Studio uses the Chirp model, but has length and file format limits. You can circumvent those by using the Cloud Speech-to-Text Console instead.

vertex ai studio 01 IDG

Google Vertex AI Studio overview console, emphasizing Google’s newest proprietary generative AI models. Note the use of Google Gemini for multimodal AI, PaLM2 or Gemini for language AI, Imagen for vision (image generation and infill), and the Universal Speech Model for speech recognition and synthesis.

vertex ai studio 03IDG

Multimodal generative AI demonstration from Vertex AI. The model, Gemini Pro Vision, is able to read the message from the image despite the elaborate calligraphy.

Generative AI workflow

As you can see in the diagram below, Google Vertex AI’s generative AI workflow is a bit more complicated than simply throwing a prompt over the wall and getting a response back. Google’s responsible AI and safety filter applies both to the input and output, shielding the model from malicious prompts and the user from malicious responses.

The foundation model that processes the query can be pre-trained or tuned. Model tuning, if desired, can be performed using several methods, all of which are out-of-band for the query/response workflow and quite time-consuming.

If grounding is required, it’s applied here. The diagram shows the grounding service after the model in the flow; that’s not exactly how RAG works, as I explained in January. Out-of-band, you build your vector database. In-band, you generate an embedding vector for the query, use it to perform a similarity search against the vector database, and finally you include what you’ve retrieved from the vector database as an augmentation to the original query and pass it to the model.

At this point, the model generates answers, possibly based on multiple documents. The workflow allows for the inclusion of citations before sending the response back to the user through the safety filter.

vertex ai studio 02IDG

The generative AI workflow typically starts with prompting by the user. On the back end, the prompt passes through a safety filter to pre-trained or tuned foundation models, optionally using a grounding service for RAG. After a citation check, the reply passes back through the safety filter and to the user.

Grounding and Vertex AI Search

As you might expect from the way RAG works, Vertex AI requires you to take a few steps to enable RAG. First, you need to “onboard to Vertex AI Search and Conversation,” a matter of a few clicks and a few minutes of waiting. Then you need to create an AI Search data store, which can be accomplished by crawling websites, importing data from a BigQuery table, importing data from a Cloud Storage bucket (PDF, HTML, TXT, JSONL, CSV, DOCX, or PPTX formats), or by calling an API.

Finally, you need to set up a prompt with a model that supports RAG (currently only text-bison and chat-bison, both PaLM 2 language models) and configure it to use your AI Search and Conversation data store. If you are using the Vertex AI console, this setup is in the advanced section of the prompt parameters, as shown in the first screenshot below. If you are using the Vertex AI API, this setup is in the groundingConfig section of the parameters:

{
  "instances": [
    { "prompt": "PROMPT"}
  ],
  "parameters": {
    "temperature": TEMPERATURE,
    "maxOutputTokens": MAX_OUTPUT_TOKENS,
    "topP": TOP_P,
    "topK": TOP_K,
    "groundingConfig": {
      "sources": [
          {
              "type": "VERTEX_AI_SEARCH",
              "vertexAiSearchDatastore": "VERTEX_AI_SEARCH_DATA_STORE"
          }
      ]
    }
  }
}
vertex ai studio 04 IDG

If you’re constructing a prompt for a model that supports grounding, the Enable Grounding toggle at the right, under Advanced, will be enabled, and you can click it, as I have here. Clicking on Customize brings up another right-hand panel where you can select Vertex AI Search from the drop-down list and fill in the path to the Vertex AI data store.

Note that grounding or RAG may or may not be needed, depending on how and when the model was trained.

vertex ai studio 05 IDG

It’s usually worth checking to see whether you need grounding for any given prompt/model pair. I thought I might need to add the poems section of the Poetry.org site to get a good completion for “Shall I compare thee to a summer’s day?” But as you can see above, the text-bison model already knew the sonnet from four sources it could (and did) cite.

Gemini, Imagen, Chirp, Codey, and PaLM 2

Google’s proprietary models offer some of the added value of the Vertex AI site. Gemini was unique in being a multimodal model (as well as a text and code generation model) as recently as a few weeks before I wrote this. Then OpenAI GPT-4 incorporated DALL-E, which allowed it to generate text or images. Currently, Gemini can generate text from images and videos, but GPT-4/DALL-E can’t.

Gemini versions currently offered on Vertex AI include Gemini Pro, a language model with “the best performing Gemini model with features for a wide range of tasks;” Gemini Pro Vision, a multimodal model “created from the ground up to be multimodal (text, images, videos) and to scale across a wide range of tasks;” and Gemma, “open checkpoint variants of Google DeepMind’s Gemini model suited for a variety of text generation tasks.”

Additional Gemini versions have been announced: Gemini 1.0 Ultra, Gemini Nano (to run on devices), and Gemini 1.5 Pro, a mixture-of-experts (MoE) mid-size multimodal model, optimized for scaling across a wide range of tasks, that performs at a similar level to Gemini 1.0 Ultra. According to Demis Hassabis, CEO and co-founder of Google DeepMind, Gemini 1.5 Pro comes with a standard 128,000 token context window, but a limited group of customers can try it with a context window of up to 1 million tokens via Vertex AI in private preview.

Imagen 2 is a text-to-image diffusion model from Google Brain Research that Google says has “an unprecedented degree of photorealism and a deep level of language understanding.” It’s competitive with DALL-E 3, Midjourney 6, and Adobe Firefly 2, among others.

Chirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model. It can turn audio speech to formatted text, caption videos for subtitles, and transcribe audio content for entity extraction and content classification.

Codey exists in versions for code completion (code-gecko), code generation (̉code-bison), and code chat (codechat-bison). The Codey APIs support the Go, GoogleSQL, Java, JavaScript, Python, and TypeScript languages, and Google Cloud CLI, Kubernetes Resource Model (KRM), and Terraform infrastructure as code. Codey competes with GitHub Copilot, StarCoder 2, CodeLlama, LocalLlama, DeepSeekCoder, CodeT5+, CodeBERT, CodeWhisperer, Bard, and various other LLMs that have been fine-tuned on code such as OpenAI Codex, Tabnine, and ChatGPTCoding.

PaLM 2 exists in versions for text (text-bison and text-unicorn), chat (̉chat-bison), and security-specific tasks (sec-palm, currently only available by invitation). PaLM 2 text-bison is good for summarization, question answering, classification, sentiment analysis, and entity extraction. PaLM 2 chat-bison is fine-tuned to conduct natural conversation, for example to perform customer service and technical support or serve as a conversational assistant for websites. PaLM 2 text-unicorn, the largest model in the PaLM family, excels at complex tasks such as coding and chain-of-thought (CoT).

Google also provides embedding models for text (textembedding-gecko and textembedding-gecko-multilingual) and multimodal (multimodalembedding). Embeddings plus a vector database (Vertex AI Search) allow you to implement semantic or similarity search and RAG, as described above.

vertex ai studio 06 IDG

Vertex AI documentation overview of multimodal models. Note the example at the lower right. The text prompt “Give me a recipe for these cookies” and an unlabeled picture of chocolate-chip cookies causes Gemini to respond with an actual recipe for chocolate-chip cookies.

Vertex AI Model Garden

In addition to Google’s proprietary models, the Model Garden (documentation) currently offers roughly 90 open-source models and 38 task-specific solutions. In general, the models have model cards. The Google models are available through Vertex AI APIs and Google Colab as well as in the Vertex AI console. The APIs are billed on a usage basis.

The other models are typically available in Colab Enterprise and can be deployed as an endpoint. Note that endpoints are deployed on serious instances with accelerators (for example 96 CPUs and 8 GPUs), and therefore accrue significant charges as long as they are deployed.

Foundation models offered include Claude 3 Opus (coming soon), Claude 3 Sonnet (preview), Claude 3 Haiku (coming soon), Llama 2, and Stable Diffusion v1-5. Fine-tunable models include PyTorch-ZipNeRF for 3D reconstruction, AutoGluon for tabular data, Stable Diffusion LoRA (MediaPipe) for text to image generation, and ̉̉MoViNet Video Action Recognition.

Generative AI prompt design

The Google AI prompt design strategies page does a decent and generally vendor-neutral job of explaining how to design prompts for generative AI. It emphasizes clarity, specificity, including examples (few-shot learning), adding contextual information, using prefixes for clarity, letting models complete partial inputs, breaking down complex prompts into simpler components, and experimenting with different parameter values to optimize results.

Let’s look at three examples, one each for multimodal, text, and vision. The multimodal example is interesting because it uses two images and a text question to get an answer.

1

2



Page 2

vertex ai studio 07 IDG

Here the prompt asks the price of what’s shown in the first image. The Gemini Pro Vision model has to match up the fruit in the first image to the second image, and read the hand-written price tag in the second image to come up with the answer, $4 each. The next two screenshots show details of the two images.

vertex ai studio 08bIDG

The naive reader might think that these are green apples. Nevertheless if you view the next screenshot you’ll see that they are labeled Asian pears.

vertex ai studio 09IDG

The fruits here are distinct enough that you aren’t likely to confuse them, even if Asian pears are unfamiliar, and despite the pears having wrappings in this image but not in the previous image.

vertex ai studio 10 IDG

This text extraction example using the Gemini Pro model asks for a JSON format answer extracted from plain text, and offers a one-shot example for guidance in the middle Examples pane. The inferred result in the Test pane is correct.

vertex ai studio 11 IDG

Because I’m rather bloody-minded when I test, I used an image of my own, my back yard after a snowstorm. OK, it looks like my neighbor’s garage was misidentified as a house, but the first generated caption isn’t bad.

Vertex AI model tuning

It’s almost always worthwhile to try prompt engineering first, but if that fails, the next step to customize a base model for your own purposes is model tuning. The Google AI guidance on model tuning and model tuning with AI Studio is very good, and to the point.

Currently Vertex AI only supports supervised fine-tuning of two text foundation models, Gemini 1.0 Pro and text-bison-001. While you can sometimes get by with as few as 20 examples when you’re doing supervised learning for fine-tuning, the usual recommendations from Google are listed in the table below.

Task

No. of examples in data set

Classification

100+

Summarization

100-500+

Document search

100+

If you’d like to try out fine-tuning for free, you can run it in Colab using this Python Quickstart.

There are two supported methods for fine-tuning text models in AI Studio, supervised tuning and RLHF tuning. Google recommends supervised tuning for classification, sentiment analysis, entity extraction, summarization of content that’s not complex, and domain-specific queries. Google recommends RLHF tuning for question answering, summarization of complex content, and content creation. However, supervised tuning is the only option for code models. 

You can also tune embedding models and create distilled text models. Distilled text models use a large, capable teacher model and a labeled or an unlabeled training data set to train a smaller but more accurate student model.

If all of the above fails, your next step might be continued pre-training. The good news about that is that it uses unlabeled data. The bad news is that it requires lots of exemplars, takes days to run, and can be quite expensive.

Vertex AI Studio vs. the competition

Overall, Vertex AI Studio is a promising product that could potentially compete strongly with Amazon Bedrock and Azure AI Studio. On the other hand, Google has been busy shooting itself in the foot. The company mismanaged its Gemini Image rollout to the point where co-founder Sergey Brin returned from wherever he has been hiding to say “We definitely messed up on the image generation.” I won’t repeat my rant about Google’s history of eating its young from my review of Project IDX, but you can read it at the end of that article.

There are lots of good things about Vertex AI Studio, including its use of Google’s own models, its rapid adoption and deployment of new models from other vendors, and its straightforward support for RAG and model tuning. Meanwhile, why are the Generative AI Extensions—potentially the most useful facility of Vertex AI Studio, and the piece that competes with OpenAI GPTs—buried in a private preview?

As far as which AI app building platform you should choose, if you’re already heavily invested in the Google Cloud, then using Google Vertex AI Studio for building AI apps is probably a no-brainer, as long as you can get access to all the models and capabilities that you need, rather than be told that you can’t have them because they are in private preview.

Given Google’s investment in its cloud platform, I don’t seriously expect Vertex AI Studio to be killed outright in the next couple of years, but I wouldn’t be at all surprised by yet another rebranding.

Cost: Cost is based on usage. See https://cloud.google.com/vertex-ai/pricing#generative_ai_models.

Platform: Google Cloud Platform.

Next read this:

Vertex AI Studio puts the promise in generative AI

Posted by on 26 March, 2024

This post was originally published on this site

Vertex AI Studio is an online environment for building AI apps, featuring Gemini, Google’s own multimodal generative AI model that can work with text, code, audio, images, and video. In addition to Gemini, Vertex AI provides access to more than 40 proprietary models and more than 60 open source models in its Model Garden, for example the proprietary PaLM 2, Imagen, and Codey models from Google Research, open source models like Llama 2 from Meta, and Claude 2 and Claude 3 from Anthropic. Vertex AI also offers pre-trained APIs for speech, natural language, translation, and vision.

Vertex AI supports prompt engineering, hyper-parameter tuning, retrieval-augmented generation (RAG), and model tuning. You can tune foundation models with your own data, using tuning options such as adapter tuning and reinforcement learning from human feedback (RLHF), or perform style and subject tuning for image generation.

Vertex AI Extensions connect models to real-world data and real-time actions. Vertex AI allows you to work with models both in the Google Cloud console and via APIs in Python, Node.js, Java, and Go.

Competitive products include Amazon Bedrock, Azure AI Studio, LangChain/LangSmith, LlamaIndex, Poe, and the ChatGPT GPT Builder. The technical levels, scope, and programming language support of these products vary.

Vertex AI Studio

Vertex AI Studio is a Google Cloud console tool for building and testing generative AI models. It allows you to design and test prompts and customize foundation models to meet your application’s needs.

Foundation models are another term for the generative AI models found in Vertex AI. Calling them foundation models emphasizes the fact that they can be customized with your data for the specialized purposes of your application. They can generate text, chat, image, code, video, multimodal data, and embeddings.

Embeddings are vector representations of other data, for example text. Search engines often use vector embeddings, a cosine metric, and a nearest-neighbor algorithm to find text that is relevant (similar) to a query string.

The proprietary Google generative AI models available in Vertex AI include:

  • Gemini API: Advanced reasoning, multi-turn chat, code generation, and multimodal prompts.
  • PaLM API: Natural language tasks, text embeddings, and multi-turn chat.
  • Codey APIs: Code generation, code completion, and code chat.
  • Imagen API: Image generation, image editing, and visual captioning.
  • MedLM: Medical question answering and summarization (private GA).

Vertex AI Studio allows you to test models using prompt samples. The prompt galleries are organized by the type of model (multimodal, text, vision, or speech) and the task being demonstrated, for example “summarize key insights from a financial report table” (text) or “read the text from this handwritten note image” (multimodal).

Vertex AI also helps you to design and save your own prompts. The types of prompt are broken down by purpose, for example text generation versus code generation and single-shot versus chat. Iterating on your prompts is a surprisingly powerful way of customizing a model to produce the output you want, as we’ll discuss below.

When prompt engineering isn’t enough to coax a model into producing the desired output, and you have a training data set in a suitable format, you can take the next step and tune a foundation model in one of several ways: supervised tuning, RLHF tuning, or distillation. Again, we’ll discuss this in more detail later on in this review.

The Vertex AI Studio speech tool can convert speech to text and text to speech. For text to speech you can choose your preferred voice and control its speed. For speech to text, Vertex AI Studio uses the Chirp model, but has length and file format limits. You can circumvent those by using the Cloud Speech-to-Text Console instead.

vertex ai studio 01 IDG

Google Vertex AI Studio overview console, emphasizing Google’s newest proprietary generative AI models. Note the use of Google Gemini for multimodal AI, PaLM2 or Gemini for language AI, Imagen for vision (image generation and infill), and the Universal Speech Model for speech recognition and synthesis.

vertex ai studio 03IDG

Multimodal generative AI demonstration from Vertex AI. The model, Gemini Pro Vision, is able to read the message from the image despite the elaborate calligraphy.

Generative AI workflow

As you can see in the diagram below, Google Vertex AI’s generative AI workflow is a bit more complicated than simply throwing a prompt over the wall and getting a response back. Google’s responsible AI and safety filter applies both to the input and output, shielding the model from malicious prompts and the user from malicious responses.

The foundation model that processes the query can be pre-trained or tuned. Model tuning, if desired, can be performed using several methods, all of which are out-of-band for the query/response workflow and quite time-consuming.

If grounding is required, it’s applied here. The diagram shows the grounding service after the model in the flow; that’s not exactly how RAG works, as I explained in January. Out-of-band, you build your vector database. In-band, you generate an embedding vector for the query, use it to perform a similarity search against the vector database, and finally you include what you’ve retrieved from the vector database as an augmentation to the original query and pass it to the model.

At this point, the model generates answers, possibly based on multiple documents. The workflow allows for the inclusion of citations before sending the response back to the user through the safety filter.

vertex ai studio 02IDG

The generative AI workflow typically starts with prompting by the user. On the back end, the prompt passes through a safety filter to pre-trained or tuned foundation models, optionally using a grounding service for RAG. After a citation check, the reply passes back through the safety filter and to the user.

Grounding and Vertex AI Search

As you might expect from the way RAG works, Vertex AI requires you to take a few steps to enable RAG. First, you need to “onboard to Vertex AI Search and Conversation,” a matter of a few clicks and a few minutes of waiting. Then you need to create an AI Search data store, which can be accomplished by crawling websites, importing data from a BigQuery table, importing data from a Cloud Storage bucket (PDF, HTML, TXT, JSONL, CSV, DOCX, or PPTX formats), or by calling an API.

Finally, you need to set up a prompt with a model that supports RAG (currently only text-bison and chat-bison, both PaLM 2 language models) and configure it to use your AI Search and Conversation data store. If you are using the Vertex AI console, this setup is in the advanced section of the prompt parameters, as shown in the first screenshot below. If you are using the Vertex AI API, this setup is in the groundingConfig section of the parameters:

{
  "instances": [
    { "prompt": "PROMPT"}
  ],
  "parameters": {
    "temperature": TEMPERATURE,
    "maxOutputTokens": MAX_OUTPUT_TOKENS,
    "topP": TOP_P,
    "topK": TOP_K,
    "groundingConfig": {
      "sources": [
          {
              "type": "VERTEX_AI_SEARCH",
              "vertexAiSearchDatastore": "VERTEX_AI_SEARCH_DATA_STORE"
          }
      ]
    }
  }
}
vertex ai studio 04 IDG

If you’re constructing a prompt for a model that supports grounding, the Enable Grounding toggle at the right, under Advanced, will be enabled, and you can click it, as I have here. Clicking on Customize brings up another right-hand panel where you can select Vertex AI Search from the drop-down list and fill in the path to the Vertex AI data store.

Note that grounding or RAG may or may not be needed, depending on how and when the model was trained.

vertex ai studio 05 IDG

It’s usually worth checking to see whether you need grounding for any given prompt/model pair. I thought I might need to add the poems section of the Poetry.org site to get a good completion for “Shall I compare thee to a summer’s day?” But as you can see above, the text-bison model already knew the sonnet from four sources it could (and did) cite.

Gemini, Imagen, Chirp, Codey, and PaLM 2

Google’s proprietary models offer some of the added value of the Vertex AI site. Gemini was unique in being a multimodal model (as well as a text and code generation model) as recently as a few weeks before I wrote this. Then OpenAI GPT-4 incorporated DALL-E, which allowed it to generate text or images. Currently, Gemini can generate text from images and videos, but GPT-4/DALL-E can’t.

Gemini versions currently offered on Vertex AI include Gemini Pro, a language model with “the best performing Gemini model with features for a wide range of tasks;” Gemini Pro Vision, a multimodal model “created from the ground up to be multimodal (text, images, videos) and to scale across a wide range of tasks;” and Gemma, “open checkpoint variants of Google DeepMind’s Gemini model suited for a variety of text generation tasks.”

Additional Gemini versions have been announced: Gemini 1.0 Ultra, Gemini Nano (to run on devices), and Gemini 1.5 Pro, a mixture-of-experts (MoE) mid-size multimodal model, optimized for scaling across a wide range of tasks, that performs at a similar level to Gemini 1.0 Ultra. According to Demis Hassabis, CEO and co-founder of Google DeepMind, Gemini 1.5 Pro comes with a standard 128,000 token context window, but a limited group of customers can try it with a context window of up to 1 million tokens via Vertex AI in private preview.

Imagen 2 is a text-to-image diffusion model from Google Brain Research that Google says has “an unprecedented degree of photorealism and a deep level of language understanding.” It’s competitive with DALL-E 3, Midjourney 6, and Adobe Firefly 2, among others.

Chirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model. It can turn audio speech to formatted text, caption videos for subtitles, and transcribe audio content for entity extraction and content classification.

Codey exists in versions for code completion (code-gecko), code generation (̉code-bison), and code chat (codechat-bison). The Codey APIs support the Go, GoogleSQL, Java, JavaScript, Python, and TypeScript languages, and Google Cloud CLI, Kubernetes Resource Model (KRM), and Terraform infrastructure as code. Codey competes with GitHub Copilot, StarCoder 2, CodeLlama, LocalLlama, DeepSeekCoder, CodeT5+, CodeBERT, CodeWhisperer, Bard, and various other LLMs that have been fine-tuned on code such as OpenAI Codex, Tabnine, and ChatGPTCoding.

PaLM 2 exists in versions for text (text-bison and text-unicorn), chat (̉chat-bison), and security-specific tasks (sec-palm, currently only available by invitation). PaLM 2 text-bison is good for summarization, question answering, classification, sentiment analysis, and entity extraction. PaLM 2 chat-bison is fine-tuned to conduct natural conversation, for example to perform customer service and technical support or serve as a conversational assistant for websites. PaLM 2 text-unicorn, the largest model in the PaLM family, excels at complex tasks such as coding and chain-of-thought (CoT).

Google also provides embedding models for text (textembedding-gecko and textembedding-gecko-multilingual) and multimodal (multimodalembedding). Embeddings plus a vector database (Vertex AI Search) allow you to implement semantic or similarity search and RAG, as described above.

vertex ai studio 06 IDG

Vertex AI documentation overview of multimodal models. Note the example at the lower right. The text prompt “Give me a recipe for these cookies” and an unlabeled picture of chocolate-chip cookies causes Gemini to respond with an actual recipe for chocolate-chip cookies.

Vertex AI Model Garden

In addition to Google’s proprietary models, the Model Garden (documentation) currently offers roughly 90 open-source models and 38 task-specific solutions. In general, the models have model cards. The Google models are available through Vertex AI APIs and Google Colab as well as in the Vertex AI console. The APIs are billed on a usage basis.

The other models are typically available in Colab Enterprise and can be deployed as an endpoint. Note that endpoints are deployed on serious instances with accelerators (for example 96 CPUs and 8 GPUs), and therefore accrue significant charges as long as they are deployed.

Foundation models offered include Claude 3 Opus (coming soon), Claude 3 Sonnet (preview), Claude 3 Haiku (coming soon), Llama 2, and Stable Diffusion v1-5. Fine-tunable models include PyTorch-ZipNeRF for 3D reconstruction, AutoGluon for tabular data, Stable Diffusion LoRA (MediaPipe) for text to image generation, and ̉̉MoViNet Video Action Recognition.

Generative AI prompt design

The Google AI prompt design strategies page does a decent and generally vendor-neutral job of explaining how to design prompts for generative AI. It emphasizes clarity, specificity, including examples (few-shot learning), adding contextual information, using prefixes for clarity, letting models complete partial inputs, breaking down complex prompts into simpler components, and experimenting with different parameter values to optimize results.

Let’s look at three examples, one each for multimodal, text, and vision. The multimodal example is interesting because it uses two images and a text question to get an answer.

1

2



Page 2

vertex ai studio 07 IDG

Here the prompt asks the price of what’s shown in the first image. The Gemini Pro Vision model has to match up the fruit in the first image to the second image, and read the hand-written price tag in the second image to come up with the answer, $4 each. The next two screenshots show details of the two images.

vertex ai studio 08bIDG

The naive reader might think that these are green apples. Nevertheless if you view the next screenshot you’ll see that they are labeled Asian pears.

vertex ai studio 09IDG

The fruits here are distinct enough that you aren’t likely to confuse them, even if Asian pears are unfamiliar, and despite the pears having wrappings in this image but not in the previous image.

vertex ai studio 10 IDG

This text extraction example using the Gemini Pro model asks for a JSON format answer extracted from plain text, and offers a one-shot example for guidance in the middle Examples pane. The inferred result in the Test pane is correct.

vertex ai studio 11 IDG

Because I’m rather bloody-minded when I test, I used an image of my own, my back yard after a snowstorm. OK, it looks like my neighbor’s garage was misidentified as a house, but the first generated caption isn’t bad.

Vertex AI model tuning

It’s almost always worthwhile to try prompt engineering first, but if that fails, the next step to customize a base model for your own purposes is model tuning. The Google AI guidance on model tuning and model tuning with AI Studio is very good, and to the point.

Currently Vertex AI only supports supervised fine-tuning of two text foundation models, Gemini 1.0 Pro and text-bison-001. While you can sometimes get by with as few as 20 examples when you’re doing supervised learning for fine-tuning, the usual recommendations from Google are listed in the table below.

Task

No. of examples in data set

Classification

100+

Summarization

100-500+

Document search

100+

If you’d like to try out fine-tuning for free, you can run it in Colab using this Python Quickstart.

There are two supported methods for fine-tuning text models in AI Studio, supervised tuning and RLHF tuning. Google recommends supervised tuning for classification, sentiment analysis, entity extraction, summarization of content that’s not complex, and domain-specific queries. Google recommends RLHF tuning for question answering, summarization of complex content, and content creation. However, supervised tuning is the only option for code models. 

You can also tune embedding models and create distilled text models. Distilled text models use a large, capable teacher model and a labeled or an unlabeled training data set to train a smaller but more accurate student model.

If all of the above fails, your next step might be continued pre-training. The good news about that is that it uses unlabeled data. The bad news is that it requires lots of exemplars, takes days to run, and can be quite expensive.

Vertex AI Studio vs. the competition

Overall, Vertex AI Studio is a promising product that could potentially compete strongly with Amazon Bedrock and Azure AI Studio. On the other hand, Google has been busy shooting itself in the foot. The company mismanaged its Gemini Image rollout to the point where co-founder Sergey Brin returned from wherever he has been hiding to say “We definitely messed up on the image generation.” I won’t repeat my rant about Google’s history of eating its young from my review of Project IDX, but you can read it at the end of that article.

There are lots of good things about Vertex AI Studio, including its use of Google’s own models, its rapid adoption and deployment of new models from other vendors, and its straightforward support for RAG and model tuning. Meanwhile, why are the Generative AI Extensions—potentially the most useful facility of Vertex AI Studio, and the piece that competes with OpenAI GPTs—buried in a private preview?

As far as which AI app building platform you should choose, if you’re already heavily invested in the Google Cloud, then using Google Vertex AI Studio for building AI apps is probably a no-brainer, as long as you can get access to all the models and capabilities that you need, rather than be told that you can’t have them because they are in private preview.

Given Google’s investment in its cloud platform, I don’t seriously expect Vertex AI Studio to be killed outright in the next couple of years, but I wouldn’t be at all surprised by yet another rebranding.

Cost: Cost is based on usage. See https://cloud.google.com/vertex-ai/pricing#generative_ai_models.

Platform: Google Cloud Platform.

Next read this:

Amazon Bedrock: A solid generative AI foundation

Posted by on 27 February, 2024

This post was originally published on this site

Amazon Web Services’ fully managed service for building, deploying, and scaling generative AI applications, Amazon Bedrock offers a catalog of foundation models, implements retrieval-augmented generation (RAG) and vector embeddings, hosts knowledge bases, implements fine-tuning of foundation models, and allows continued pre-training of selected foundation models.

Amazon Bedrock complements the almost 30 other Amazon machine learning services available, including Amazon Q, the AWS generative AI assistant.

There are currently six major features in Amazon Bedrock:

  • Experiment with different models: Use the API or GUI in the console to test various prompts and configurations with different foundation models.
  • Integrate external data sources: Improve response generation by incorporating external data sources into knowledge bases, which can be queried to augment the responses from foundation models.
  • Develop customer support applications: Build applications that use foundation models, API calls, and knowledge bases to reason and execute tasks for customers.
  • Customize models: Tailor a foundation model for particular tasks or domains by providing training data for fine-tuning or additional pretraining.
  • Boost application efficiency: Optimize the performance of foundation model-based applications by purchasing provisioned throughput.
  • Choose the most suitable model: Compare the outputs of various models using standard or custom prompt data sets to choose the model that best aligns with the requirements of your application.

One major competitor to Amazon Bedrock is Azure AI Studio, which, while still in preview and somewhat under construction, checks most of the boxes for a generative AI application builder. Azure AI Studio is a nice system for picking generative AI models, grounding them with RAG using vector embeddings, vector search, and data, and fine-tuning them, all to create what Microsoft calls copilots, or AI agents.

Another major competitor is Google Vertex AI’s Generative AI Studio, which allows you to tune foundation models with your own data, using tuning options such as adapter tuning and reinforcement learning from human feedback (RLHF), or style and subject tuning for image generation. Generative AI Studio complements the Vertex AI model garden and foundation models as APIs.

Other possible competitors include LangChain (and LangSmith), Poe, and the ChatGPT GPT Builder. LangChain does require you to do some programming.

Amazon Bedrock model setup

There are two setup tasks for Bedrock: model setup and API setup. You need to request access to models before you can use them. If you want to use the AWS command line interface or any of the AWS SDKs, you also need to install and configure the CLI or SDK.

I didn’t bother with API setup, as I’m concentrating on using the console for the purposes of this review. Completing the model access request form was easier than it looked, and I was granted access to models faster than I expected.

amazon bedrock 02 IDG

You can’t use a model in Amazon Bedrock until you’ve requested and received permission to use it. Most vendors grant access immediately. Anthropic takes a few minutes, and requires you to fill out a short questionnaire about your planned usage. This screenshot was taken just before my Claude access requests were granted.

Amazon Bedrock model inference parameters

Amazon Bedrock uses slightly different parameters to control the response of models than, say, OpenAI. Bedrock controls randomness and diversity using the temperature of the probability distribution, the top K, and the top P. It controls the length of the output with the response length, penalties, and stop sequences.

Temperature modulates the probability for the next token. A lower temperature leads to more deterministic responses, and a higher temperature leads to more random responses. In other words, choose a lower temperature to increase the likelihood of higher-probability tokens and decrease the likelihood of lower-probability tokens; choose a higher temperature to increase the likelihood of lower-probability tokens and decrease the likelihood of higher-probability tokens. For example, a high temperature would allow the completion of “I hear the hoof beats of” to include unlikely beasts like unicorns, while a low temperature would weight the output to likely ungulates like horses.

Top K is the number of most-likely candidates that the model considers for the next token. Lower values limit the options to more likely outputs, like horses. Higher values allow the model to choose less likely outputs, like unicorns.

Top P is the percentage of most-likely candidates that the model considers for the next token. As with top K, lower values limit the options to more likely outputs, and higher values allow the model to choose less likely outputs.

Response length controls the number of tokens in the generated response. Penalties can apply to length, repeated tokens, frequency of tokens, and type of tokens in a response. Stop sequences are sequences of characters that stop the model from generating further tokens.

Amazon Bedrock prompts, examples, and playgrounds

Amazon Bedrock currently displays 33 examples of generative AI model usage, and offers three playgrounds. Playgrounds provide a console environment to experiment with running inference on different models and with different configurations. You can start with one of the playgrounds (chat, text, or image), select a model, construct a prompt, and set the metaparameters. Or you can start with an example and open it in the appropriate playground with the model and metaparameters pre-selected and the prompt pre-populated. Note that you need to have been granted access to a model before you can use it in a playground.

Amazon Bedrock examples demonstrate prompts and parameters for various supported models and tasks. Tasks include summarization, question answering, problem solving, code generation, text generation, and image generation. Each example shows a model, prompt, parameters, and response, and presents a button you can press to open the example in a playground. The results you get in the playground may or may not match what is shown in the example, especially if the parameters allow for lower-probability tokens.

Our first example shows arithmetic word problem solving using a chain-of-thought prompt and the Llama 2 Chat 70B v1 model. There are several points of interest in this example. First, it works with a relatively small open-source chat model. (As an aside, there’s a related example that uses a 7B (billion) parameter model instead of the 70B parameter model used here; it also works.) Second, the chain-of-thought action is triggered by a simple addition to the prompt, “Let’s think step by step.” Note that if you remove that line, the model often goes off the rails and generates a wrong answer.

amazon bedrock 03 IDG

The chain-of-thought problem-solving example uses a Llama 2 chat model and presents a typical 2nd or 3rd grade arithmetic word problem. Note the [INST]You are a…[/INST] block at the beginning of the prompt. This seems to be specific to Llama. You’ll see other models respond to different formats for defining instructions or system prompts.

amazon bedrock 04 IDG

The chain-of-thought problem-solving example running in the Amazon Bedrock Chat playground. This particular set of prompts and hyperparameters usually gives correct answers, although not in the exact same format every time. If you remove the “Let’s think step by step” part of the prompt it usually gives wrong answers. The temperature setting of 0.5 asks for moderate randomness in the probability mass function, and the top P setting of 0.9 allows the model to consider less likely outputs.

Our second example shows contract entity extraction using Cohere’s Command text generation model. Text LLMs (large language models) often allow for many different text processing functions.

amazon bedrock 05 IDG

Amazon Bedrock contract entity extraction example using Cohere’s Command text generation model. Note that the instruction here is on the first line followed by a colon, and then the contract body follows.

amazon bedrock 06 IDG

Contract entity extraction example running in the Amazon Bedrock text playground. Note that there was an opportunity for additional interaction in the playground, which didn’t show up in the example. While the temperature of this run was 0.9, Cohere’s Command model takes temperature values up to 5. The top p value is set to 1 (and displayed at 0.99) and the top k parameter is not set. These allow for high randomness in the generated text.

Our final example shows image inpainting, an application of image generation that uses a reference image, a mask, and prompts to produce a new image. Up until now, I’ve only done AI image inpainting in Adobe Photoshop, which has had the capability for awhile.

amazon bedrock 07 IDG

Amazon Bedrock’s image inpainting example uses the Titan Image Generator G1 model. Note the reference image and mask image in the image configuration.

amazon bedrock 08 IDG

In order to actually select the flowers for inpainting, I had to move the mask from the default selection of the backpack to the area containing the white flowers in the reference image. When I didn’t do that, orange flowers were generated in front of the backpack.

amazon bedrock 09 IDG

Successful inpainting in Amazon Bedrock. Note that I could have used the mask prompt to refine the mask for complex mask selections in noncontiguous areas, for example selecting the flowers and the books. You can use the Info links to see explanations of individual hyperparameters.

Amazon Bedrock orchestration

Amazon Bedrock orchestration currently includes importing data sources into knowledge bases that you can then use for setting up RAG, and creating agents that can execute actions. These are two of the most important techniques available for building generative AI applications, falling between simple prompt engineering and expensive and time-consuming continued pre-training or fine-tuning.

Using knowledge bases takes multiple steps. Start by importing your data sources into an Amazon S3 bucket. When you do that, specify the chunking you’d like for your data. The default is approximately 300 tokens per chunk, but you can set your own size. Then set up your vector store and embeddings model in the database you prefer, or allow AWS to use its default of Amazon OpenSearch Serverless. Then create your knowledge base from the Bedrock console, ingest your data sources, and test your knowledge base. Finally, you can connect your knowledge base to a model for RAG, or take the next step and connect it to an agent. There’s a good one-hour video about this by Mani Khanuja, recorded at AWS re:Invent 2023.

Agents orchestrate interactions between foundation models, data sources, software applications, and prompts, and call APIs to take actions. In addition to the components of RAG, agents can follow instructions, use an OpenAPI schema to define the APIs that the agent can invoke, and/or invoke a Lambda function.

amazon bedrock 10 IDG

Amazon Bedrock knowledge base creation and testing starts with this screen. There are several more steps.

Amazon Bedrock model assessment and deployment

The Assessment and Deployment panel in Amazon Bedrock contains functionality for model evaluation and provisioned throughput.

Model evaluation supports automatic evaluation of a single model, manual evaluation of up to two models using your own work team, and manual evaluation of as many models as you wish using an AWS-managed work team. Automatic evaluation uses recommended metrics, which vary depending on the type of task being evaluated, and can either use your own prompt data or built-in curated prompt data sets.

Provisioned throughput allows you to purchase dedicated capacity to deploy your models. Pricing varies depending on the model that you use and the level of commitment you choose.

amazon bedrock 11 IDG

Automatic model evaluation selection in Amazon Bedrock. Bedrock can also set up human model evaluations. The metrics and data sets used vary with the task type being evaluated.

amazon bedrock 12 IDG

Amazon Bedrock’s provisioning throughput isn’t cheap, and it isn’t available for every model. Here we see an estimated monthly cost of provisioning five model units of the Llama 2 Chat 13B model for one month. It’s $77.3K. Upping the term to six months drops the monthly cost to $47.7K. You can’t edit the provisioned model units or term once you’ve purchased the throughput.

Model customization methods

It’s worth discussing ways of customizing models in general at this point. Below we’ll talk specifically about the customization methods implemented in Amazon Bedrock.

Prompt engineering, as shown above, is one of the simplest ways to customize a generative AI model. Typically, models accept two prompts, a user prompt and a system or instruction prompt, and generate an output. You normally change the user prompt all the time, and use the system prompt to define the general characteristics you want the model to take on. Prompt engineering is often sufficient to define the way you want a model to respond for a well-defined task, such as generating text in specific styles by presenting sample text or question-and-answer pairs. You can easily imagine creating a prompt for “Talk Like a Pirate Day.” Ahoy, matey.

1

2



Page 2

Retrieval-augmented generation helps to ground LLMs with specific sources, often sources that weren’t included in the models’ original training. As you might guess, RAG’s three steps are retrieval from a specified source (the knowledge base in Amazon Bedrock parlance), augmentation of the prompt with the context retrieved from the source, and then generation using the model and the augmented prompt.

RAG procedures often use embedding to limit the length and improve the relevance of the retrieved context. Essentially, an embedding function takes a word or phrase and maps it to a vector of floating point numbers; these are typically stored in a database that supports a vector search index. The retrieval step then uses a semantic similarity search, typically using the cosine of the angle between the query’s embedding and the stored vectors, to find “nearby” information to use in the augmented prompt. Search engines usually do the same thing to find their answers.

Agents, aka conversational retrieval agents, expand on the idea of conversational LLMs with some combination of tools, running code, embeddings, and vector stores. In other words, they are RAG plus additional steps. Agents often help to specialize LLMs to specific domains and to tailor the output of the LLM. Azure Copilots are usually agents; Google and Amazon use the term agents. LangChain and LangSmith simplify building RAG pipelines and agents.

Fine-tuning large language models is a supervised learning process that involves adjusting the model’s parameters to a specific task. It’s done by training the model on a smaller, task-specific data set that’s labeled with examples relevant to the target task. Fine-tuning often takes hours or days using many server-level GPUs and requires hundreds or thousands of tagged exemplars. It’s still much faster than extended pre-training.

Pre-training is the unsupervised learning process on huge text data sets that teaches LLMs the basics of language and creates a generic base model. Extended or continued pre-training adds unlabeled domain-specific or task-specific data sets to the base model to specialize the model, for example to add a language, add terms for a specialty such as medicine, or add the ability to generate code. Continued pre-training (using unsupervised learning) is often followed by fine-tuning (using supervised learning).

Customizing models in Amazon Bedrock with fine-tuning and continued pre-training

Both fine-tuning and continued pre-training tend to be expensive and lengthy processes. Even preparing the data for these can be a challenge. For fine-tuning, the challenge is getting the tagging done within budget. For continued pre-training, the challenge is to find a data set for your domain of interest that doesn’t introduce biases or toxicity of any kind.

amazon bedrock 13 IDG

Amazon Bedrock can create custom models by continued pre-training and/or with fine-tuning. You can manage your models and training jobs from this screen. Note the requirement for purchasing provisioned throughput to deploy your custom model.

amazon bedrock 14 IDG

Creating a fine-tuning job in Amazon Bedrock. Note that only certain models can currently be fine-tuned: four Amazon models, two Cohere models, and two Meta models.

amazon bedrock 15 IDG

You can manage your custom model training jobs as well as your custom models in Amazon Bedrock. Note the three status codes for jobs: failed, stopped, and complete. Only completed jobs will get a link from their custom model name. All jobs get links from their job names.

amazon bedrock 16 IDG

Digging into a training job detail in Amazon Bedrock shows you its source model, when it was started, its status, and various parameters and hyperparameters.

amazon bedrock 17 IDG

Once you have completed customizing your models in Amazon Bedrock you can manage them on the models tab. You can provision them, open them in the playground, delete them, and open their details.

amazon bedrock 18 IDG

Model details look similar to training job details in Amazon Bedrock, with a few differences, such as offering purchase and management of provisioned input.

amazon bedrock 19 IDG

While the setup of a continued pre-training job looks similar to the setup of a fine-tuning job, they have some major differences. Continued pre-training is an unsupervised learning job that needs a lot of untagged data and a lot of time. Fine-tuning is a supervised learning job that needs less data (but tagged!) and less time.

Low-code generative AI using PartyRock

To accompany Amazon Bedrock, AWS has released a mostly free low-code platform for learning generative AI and building small AI apps. The introductory PartyRock blog post is by Jeff Barr, and tells you enough that you can dive in yourself; it also supplies links to PartyRock learning resources near the end of the post. If you don’t want to build an app yourself, you can still play with the apps others have built.

Generative AI app building on Bedrock

Amazon Bedrock is a credible competitor to Azure AI Studio. If you’re already committed to AWS rather than Microsoft Azure or Google Cloud, then Bedrock will certainly be a good choice for building and scaling generative AI applications.  Bedrock offers fewer foundation models than Azure AI Studio, and furthermore lacks access to any OpenAI models, but it should do the job for most generative AI apps. Bedrock is currently a little behind Azure AI Studio when it comes to content filters, but that could easily change in the future.

Note that the cost of deploying generative AI apps tends to dwarf the cost of developing them. The cost of using Amazon Bedrock to do prompt engineering and develop RAG apps tends to be low (ignoring the people costs), and the cost of testing these in the Bedrock playgrounds is usually negligible. The cost of fine-tuning tends to be something that might give small companies pause. The cost of continued pre-training may also give medium-size companies pause. But deploy an app with a customized model at scale sufficient to serve a large audience with low lag for a long period of time, and soon you’re talking about real money.

Cost: Pricing is based on the model, the volume of input tokens and output tokens, and on whether you have purchased provisioned throughput for the model. For more information, see the Model providers page in the Amazon Bedrock console.

Platform: Browser-based, hosted on AWS. API access available.

Next read this:

Google Project IDX: A promising next-generation cloud IDE

Posted by on 29 January, 2024

This post was originally published on this site

In August 2023, a small group of Google development and UX leads bewailed the difficulty of setting up a development environment for multiplatform and full-stack apps, and offered their take on an experimental prototype intended to solve the issues. Difficulty setting up technology stacks for development is not a new problem. It has been an issue since at least the early 1980s, when personal computers became available. 

Project IDX is a browser-based development environment built on Code OSS and powered by Codey, a generative AI foundation model trained on code and built on PaLM 2. Project IDX is designed to make it easier to build, manage, and deploy full-stack web and multiplatform applications, using popular frameworks and languages.

Code OSS is the fully open-source version of Microsoft’s Visual Studio Code. The latter has a few proprietary additions, despite being free software.

At the time of its announcement in August, Project IDX was only available through a waitlist sign-up; my application was finally approved in December. Project IDX is still very much a rough-edged preview, but has an interesting design and some utility, even if it’s not yet intended for use in a production environment.

There are several products that compete with Project IDX at some level. These include AWS Cloud9, Gitpod, Online IDE, Replit, StackBlitz, Eclipse Che, Codeanywhere, and GitHub Codespaces.

Feels like Visual Studio Code

There are a number of features that make Project IDX look promising despite its rough edges and its feel of being under construction. For starters, it’s actually a familiar environment for anyone who uses Visual Studio Code. As I understand it, the portions of VS Code that aren’t included in Code OSS are the Microsoft-specific customizations, which don’t matter too much in this context.

Some of those customizations are replaced by the IDX AI powered by Codey. The IDX AI provides code suggestions as you type and offers an AI-powered code chat you can ask for help with your code, to generate new code, to translate code to another language, to explain code, and to write unit tests. Supposedly, IDX AI also highlights possible license requirements based on AI-generated code, although I haven’t seen that pop up.

google project idx 01 IDG

Project IDX will feel familiar because of its similarity to VS Code. The top left “hamburger” menu replaces the top row menu in VS Code, and offers most of the same menu items when it pops out. The icons in the vertical row below that control the contents of the next column to the right, currently showing the file explorer, the code outline for the current file, the timeline for the current file, and the dependencies for the app. The large editing pane currently showing main.dart can display up to four tabs. The preview window to the right can also display the IDX AI pane and additional code file tabs. The large area at the bottom right displays code problems, output, a debug console, and a terminal.

Runs in a cloud workstation

The IDX Code OSS editor runs in a Google Cloud VM, called a Cloud Workstation. Normally, Cloud Workstation time is billed per hour at a rate that varies with the size of the machine type, from $0.16/hour to $9.36/hour. Project IDX is currently free.

Normally, Cloud Workstations support a variety of popular IDEs and Duet AI. Project IDX supports only Code OSS, and Codey instead of Duet. (I can’t tell you the difference between Duet AI and Codey in practice, although it might be an interesting comparison to investigate.) Cloud Workstations can normally run inside your private network and in your staging environment. Project IDX is currently restricted to its own environment.

Supports many languages and frameworks

You can create projects in Project IDX with built-in templates and GitHub imports. The templates support the JavaScript, TypeScript, and Dart languages and the Angular, React, NextJS, Vue, Svelte, and Flutter frameworks. In the future, Project IDX is due to support Python, Go, and “AI.” You can optionally use Nix to customize your workspace.

google project idx 03 IDG

This menu offers you your initial choice of the kind of app you’ll generate or import. Each item (other than the “coming soon” group at the bottom) opens a secondary screen for specifying your app framework and naming your app.

google project idx 03a IDG

The second-level screen for generating a new web app currently offers a choice of six web frameworks. They are Angular, React, Next.js, Vue, Svelte, or a blank app, which implies writing your own HTML, JavaScript/TypeScript, and CSS. Nix is the file you can use to customize a workspace.

Integrates with Git and GitHub

GitHub imports can be of three types: web, Flutter, and “other,” which currently appears to mean JavaScript/TypeScript frameworks other than those explicitly listed. The frameworks explicitly supported include Angular, React, Next.js, Vue, and Svelte. 

If your GitHub project has JavaScript dependencies, you can run npm install in your IDX terminal window after your import completes. You can also turn your project into a Git repository from within IDX and sync that with GitHub.

google project idx 04 IDG

Project IDX integrates well with Git and GitHub. At left, you can see the options to initialize a Git repository and publish it to GitHub.

google project idx 04a IDG

Once you have created a repo and authenticated to GitHub, Project IDX can push the repo to GitHub. Here you can see the typical GitHub display of the README.md file generated for the app by Angular.

Previews, deploys, and shares apps

In addition to a web preview, Project IDX presents previews in Android emulators and iOS simulators, where supported by the underlying template. All three work for a Flutter app. Only two, web preview and iOS simulator, work for an Angular app, since a stock Angular app isn’t native unless you add something like Ionic or NativeScript.

You can deploy directly from your workspace to Firebase hosting. On an experimental basis, you can share your workspace with complete shared access.

Project IDX comes with pre-installed extensions for the languages and frameworks it supports. It is supposed to support additional extensions that are available from OpenVSX, although I can’t confirm whether all of those work at this point—there are just too many (over 3,000) to check.

One current major limitation of Project IDX is that only two projects are allowed at once. You can get around this by saving projects to GitHub and juggling which you have open in IDX.

Note that there are numerous bug reports beyond the list in the FAQ.

google project idx 05 IDG

The Flutter app reported two setup errors. Here I am trying to resolve one of them with the help of IDX AI. Unfortunately, the AI’s advice to use sudo apt-get to install Chrome turned out to be useless, since the IDX VM does not currently include either sudo or apt-get. I won’t call that a hallucination, since those utilities might be planned for a future version.

Lives in the Google Cloud

Project IDX shows a lot of promise. It’s visually similar to Visual Studio Code for the Web (which, sadly, lacks a terminal and debugger). It’s both visually and functionally similar to GitHub Codespaces and Gitpod, and it’s functionally similar to Eclipse Che.

One reason you might prefer Project IDX to any of those would be its hosting in a Google Cloud Workspace, which is a big advantage if you want to integrate with any Google Cloud services, or with other programs you have running in the Google Cloud. On the other hand, if your existing code runs on AWS, you might want to consider using AWS Cloud9.

My biggest concern about making a commitment to Project IDX would be Google’s long history of killing its projects and services. Remember Google+? Freebase? The Google Search Appliance? Polymer? Google Domains? All ex-parrots, they’ve rung down the curtain and joined the choir invisible.

Nevertheless, Project IDX has its attractions. As long as you create a GitHub repository from your workspace and keep it current, it’s certainly worth a try.

Cost: Free preview

Platform: Browser-based, hosted on Google Cloud

Next read this:

Azure AI Studio: A nearly complete toolbox for AI development

Posted by on 22 January, 2024

This post was originally published on this site

On November 15, Microsoft announced Azure AI Studio, a new platform for generative AI application development, using OpenAI models such as GPT-4, as well as models from Microsoft Research, Meta, Hugging Face, and others. The motivation for the product, Microsoft said, is that “navigating the complexities of prompt engineering, vector search engines, the retrieval-augmented generation (RAG) pattern, and integration with Azure OpenAI Service can be daunting.”

It turns out that Azure AI Studio is a nice system for picking generative AI models, for grounding them with RAG using vector embeddings, vector search, and data, and for fine-tuning those models, all to create AI-powered copilots, or agents. It’s the “basement-level” tool for creating copilots, aimed at experienced developers and data scientists, while Microsoft’s Copilot Studio is a “2nd-floor level” low-code tool for customizing chatbots.

Azure AI Studio has competition from the usual suspects, plus a few you might not already know about. Amazon Bedrock competes with Azure AI Studio, and Amazon Q competes with Microsoft Copilots. Bedrock offers a catalog of foundation models, RAG and embeddings, knowledge bases, fine-tuning, and continued pretraining to build generative AI applications.

There’s a somewhat competing experiment from Google, called NotebookLM, which “only” lets you provide documents (Google docs, PDFs, and pasted text) for RAG against one large language model. I put “only” in air quotes because using RAG against one good model is often enough to produce a good generative AI application. Google has a long history of killing its experiments, so I’m not taking any bets on whether or how NotebookLM will become a product.

Google does have a professional product in this space. Google Vertex AI’s Generative AI Studio allows you to tune foundation models with your own data, using tuning options such as adapter tuning and reinforcement learning from human feedback (RLHF), or style and subject tuning for image generation. That complements the Vertex AI model garden and foundation models as APIs.

If you can write a little Python, JavaScript, or Go, you can accomplish many of the same things you can with Azure AI Studio—or possibly more—with LangChain and LangSmith. You can also accomplish some of the same things with Poe, which has a good selection of models and lets you customize bots with plain-text prompts as well as with code.

Azure AI Studio model catalog

Azure AI Studio hosts AI models from Microsoft Research, OpenAI, Meta, Hugging Face, and Databricks, as well as NVIDIA base models, so that you can find the current best model for your application, or at least one that works well enough. In addition, Azure AI Studio offers half a dozen Azure OpenAI language models, some of which have fine-tuning capabilities.

In general, the OpenAI models are offered “as a service,” meaning that they are deployed in a model pool with its own GPUs. When you provision them, you get an inference endpoint in your own subscription and possibly the ability to use them in fine-tuning and evaluation jobs. We’ll discuss fine-tuning when we talk about model customization below.

azure ai studio 02 IDG

The Azure AI Studio model catalog has a wide selection of models from multiple vendors, including OpenAI, NVIDIA, Meta, Hugging Face, Databricks, and Microsoft Research. The models are classified by their inference skills as well as by their creators.

Azure AI Studio model benchmarks

Not every generative AI model has the same capabilities or performance. Historically, better models have been priced higher, but recently some free open-source models have exhibited excellent performance on common tasks.

There are a number of standard benchmarks for LLMs, in particular, which are easier to measure automatically than models that generate media. As you can see in the chart below, GPT-4 32K is the current champion among installed models on Azure for most accuracy benchmarks, but bear in mind that the LLM performance picture changes on an almost daily basis.

As I write this, Google claims that its new Gemini model surpasses GPT-4. I haven’t been able to test it to know whether that’s true. Apparently, the “really good” Ultra version of Gemini won’t be available until next year. The Pro version I did test is roughly at the level of GPT-3.5.

In addition, at least three competitive small language models have been released recently. They include Starling-LM-7B, which uses reinforcement learning from AI feedback (RLAIF), from UC Berkeley.

azure ai studio 03 IDG

Azure AI Studio model benchmarks. Here we are comparing the model accuracy of four LLMs, GPT-3.5 Turbo, GPT-4 32K, Llama 2 70b, and Llama 2 70b chat, for question answering and text generation. Unsurprisingly, GPT-4 32K, the largest and most expensive model considered, came out on top. Note that chat models, which are optimized for interactive use, are not expected to outperform non-chat models on completion tasks.

Model as a service vs. model as a platform

Azure AI Studio offers models through two mechanisms: model as a service (MaaS), and model as a platform (MaaP). Model as a service means that you access the model through an API, and typically pay for usage as you go; the model itself lives in a central pool where it has access to GPUs. The Azure OpenAI models are all available as MaaS, which makes sense since they require so much GPU capacity to run. As I write this, six Meta Llama 2 models just became available as MaaS.

Model as a platform means that you deploy the model into VMs that belong to your Azure subscription. When I tried this I was deploying a Mistral 7B model to a single VM of type Standard_NC24ads_A100_v4, which has 24 vCPUs, 220.0 GiB of memory, one NVIDIA A100 PCIe GPU, and uses third-generation AMD EPYC 7V13 (Milan) processors. I wasn’t impressed by the ungrounded inference results from Mistral 7B on my custom prompts—the right answer was in there, but surrounded by irrelevant hallucinations—although I imagine I could fix that with prompt engineering and/or RAG. (See the “Model customization methods” section below.) There has been speculation that Mistral 7B was trained on benchmark test data, which could explain why it goes off the rails more than you would expect from its benchmark scores.

I’ve heard claims that the new Mixtral 8x7B eight-way mixture-of-experts model is much better, but it wasn’t available in the Azure AI Studio catalog when I was testing. GPT-4 is supposedly also an eight-way mixture-of-experts model, but it’s much bigger; OpenAI hasn’t yet confirmed how the model was built.

If your Azure account/subscription/region doesn’t have any GPU quotas, you can still deploy a generative AI model as a platform with shared GPU capacity. The trade-off for this is that shared GPU capacity is only good for a limited time, variously quoted as 24 or 168 hours. This is considered a stopgap until your cloud administrator can arrange some GPU quota for you.

Azure AI Studio model filtering criteria

Azure AI Studio can filter models by collections, the inference tasks they support, and the fine-tuning tasks they support. Currently there are eight collections, mostly representing model sources, such as Azure OpenAI, Meta, and Mistral AI. Currently there are 20 inference tasks, including text generation, question answering, embeddings, translation, and image classification. And there are 11 fine-tuning tasks, all drawn from the inference task list, but not including embeddings, which is more of an intermediate tool for implementing retrieval-augmented generation.

azure ai studio 05IDG

Azure AI Studio model filters. These were captured from a staging version of the product in December and are likely to change over time.

Model customization methods

It’s worth discussing ways of customizing models in general at this point. In the following section, you’ll see the tools and components in Azure AI Studio.

Prompt engineering is one of the simplest ways to customize a generative AI model. Typically, models accept two prompts, a user prompt and a system prompt, and generate an output. You normally change the user prompt all the time, and use the system prompt to define the general characteristics you want the model to take on.

Prompt engineering is often sufficient to define the way you want a model to respond for a well-defined task, such as generating text in specific styles. The image below shows the Azure AI Studio sample prompt for a Shakespearean writing assistant. You can easily imagine creating a similar prompt for “Talk Like a Pirate Day.” Ahoy, matey.

LLMs often have hyperparameters that you can set as part of your prompt. Hyperparameter tuning is as much a thing for LLM prompts as it is for training machine learning models. The usual important hyperparameters for LLM prompts are temperature, context window, maximum number of tokens, and stop sequence, but can vary from model to model.

The temperature controls the randomness of the output; depending on the model it can range from 0 to 1 or 0 to 2. Higher temperature values ask for more randomness. In some models, 0 means “set the temperature automatically.” In other models, 0 means “no randomness.”

The context window controls the number of preceding tokens (words or subwords) that the model takes into account for its answer. The maximum number of tokens limits the length of the generated answer. The stop sequence is used to suppress offensive or inappropriate content in the output.

Retrieval-augmented generation, or RAG, helps to ground LLMs with specific sources, often sources that weren’t included in the models’ original training. As you might guess, RAG’s three steps are retrieval from a specified source, augmentation of the prompt with the context retrieved from the source, and then generation using the model and the augmented prompt.

RAG procedures often use embedding to limit the length and improve the relevance of the retrieved context. Essentially, an embedding function takes a word or phrase and maps it to a vector of floating point numbers. These are typically stored in a database that supports a vector search index. The retrieval step then uses a semantic similarity search, typically using the cosine of the angle between the query’s embedding and the stored vectors, to find “nearby” information to use in the augmented prompt. Search engines usually do the same thing to find their answers.

Agents, aka conversational retrieval agents, expand on the idea of conversational LLMs with some combination of tools, running code, embeddings, and vector stores. In other words, they are RAG plus additional steps. Agents often help to specialize LLMs to specific domains and to tailor the output of the LLM. Azure Copilots are usually agents; Google and Amazon use the term agents. LangChain and LangSmith simplify building RAG pipelines and agents.

Fine-tuning large language models is a supervised learning process that involves adjusting the model’s parameters to a specific task. It’s done by training the model on a smaller, task-specific data set that’s labeled with examples relevant to the target task. Fine-tuning often takes hours or days using many server-level GPUs and requires hundreds or thousands of tagged exemplars. It’s still much faster than extended pretraining.

LoRA, or low-rank adaptation, is a method that decomposes a weight matrix into two smaller weight matrices. This approximates full supervised fine-tuning in a more parameter-efficient manner. The original Microsoft LoRA paper was published in 2021. A 2023 quantized variation on LoRA, QLoRA, reduces the amount of GPU memory required for the tuning process. LoRA and QLoRA typically reduce the number of tagged exemplars and time required compared to standard fine-tuning.

Pretraining is the unsupervised learning process on huge text data sets that teaches LLMs the basics of language and creates a generic base model. Extended or continued pretraining adds unlabeled domain-specific or task-specific data sets to the base model to specialize the model, for example to add a language, add terms for a specialty such as medicine, or add the ability to generate code. Continued pretraining (using unsupervised learning) is often followed by fine-tuning (using supervised learning).

azure ai studio 06 IDG

Prompt engineering. This is an Azure AI Studio prompt sample for a Shakespearean writing assistant. There are five parts to the prompt: the modality, the task, the system message, a sample user message, and a sample desired response.

Azure AI Studio tools and components

Earlier in this review, you saw the Azure AI Studio model catalog and model benchmarks. In addition to those, in its Explore tab, Azure AI Studio offers speech, vision, and language capabilities, responsible AI, and prompt samples, such as the Shakespearean writing assistant you saw in the previous section.

In its Build tab, Azure AI Studio offers the Playground, Evaluation, Prompt Flow, Custom Neural Voice, and Fine-tuning tools, and components for Data, Indexes, Deployments, and Content Filters. In the Manage tab, you can see your resources, and (at least on the staging site) your quotas for each subscription and region.

Speech

1

2



Page 2

Azure AI Studio includes Cognitive Service speech capabilities for building voice-enabled apps. Note that these are voice-specific models, not generative AI. The prebuilt voice services have links to samples you can run. The custom models have links to instructions for getting started, which may also have samples you can run.

The speech services include captioning, speech analytics, speech to text, translation with speech to text, and text to speech with pretrained and custom neural voices. The neural voices are very high quality, to the point where customers might not realize that they are AI-generated. The pretrained voice gallery currently includes 478 voices across 148 languages and variants; some of the voices can speak over 40 languages.

azure ai studio 07 IDG

Azure AI Studio speech capabilities for building voice-enabled apps. These are Cognitive Services, not generative AI. The prebuilt services have links to samples you can run. The custom models have links to instructions for getting started, which may also have samples you can run.

Vision

Azure AI Studio also includes vision services. They add the ability to read text, analyze images, and detect faces to your app using machine learning and OCR, not generative AI.

azure ai studio 08 IDG

Azure AI Studio vision services. As with the speech services, these are Cognitive Services, not generative AI.

Language

Azure AI Studio unifies three individual language services in Azure AI services—Text Analytics, QnA Maker, and Language Understanding (LUIS). I honestly don’t know whether the services use the machine-learning-based language models that Microsoft has refined over the years, or new generative AI models. In either case, these models allow you to classify and summarize documents, get real-time translations, or integrate language into your bot experiences.

azure ai studio 09 IDG

Azure AI Studio language services. These include pre-built, task-optimized language models and the ability to train your own custom model for a variety of tasks.

Responsible AI

The most current iteration of Azure’s responsible AI solution is the Content Safety Studio, shown in the first screenshot below. You can use it to moderate text and image content, filter generative AI for jailbreak risk, construct metaprompts for safety, detect protected material, and monitor online activity and data.

You can set the safety levels of a model with a content filter when you deploy the model, as shown in the second screenshot below.

azure ai studio 10 IDG

Content safety is currently the only category listed under Responsible AI in Azure AI Studio. It includes options for moderating text, image, and multimodal content as well as safety solutions for generative AI, monitoring online activity, and building a custom moderation solution.

azure ai studio 10aIDG

When you configure a content filter for an AI model in Azure AI Studio, you can adjust its sensitivity to both input and output material that includes violence, hate, sexual content, and self-harm.

Prompt samples

There are currently 25 prompt samples displayed in the Prompts section. Several are quite interesting. I recommend that you examine the Apple Cycle Analyst prompt to see how you’d teach an LLM how to interpret images, and the Chain of Thought Reasoning sample to see how to teach an LLM to solve basic arithmetic word problems. Without Chain of Thought guidance, most LLMs fail spectacularly on that kind of problem.

I’ve included a Shakespearean sonnet about daylight savings time that GPT-3.5 Turbo 16k and I generated after a few iterations on the user prompt. It uses the same system message you saw above to define the Shakespearean style. I didn’t have to explain the sonnet form.

azure ai studio 11 IDG

Currently, Azure AI Studio features 25 prompt samples, including samples with text and image input.

azure ai studio 12 IDG

This is a chat session and result from the Shakespearean writing assistant prompt we saw earlier in the “Model customization methods” sections. 0.7 is a reasonable temperature to use for generating creative material.

Playground

The Shakespearean sonnet example you just saw, and all the prompt samples I tried, open in Azure AI Studio’s Playground tool. If you prefer to work in code, you can use the link at the top right to open your project in Visaul Studio Code (Web), which is nearly identical to Visual Studio Code on the desktop. The Playground is the most useful tool in Azure AI Studio as long as you’re only doing prompt engineering and hyperparameter tuning.

Evaluation

You can run your language models and evaluate them against industry-standard metrics with this tool. Then you can choose the best version based on your need. The metrics used are groundedness, coherence, fluency, relevance, and GPTsimilarity. To perform an evaluation you first need to create a runtime. You might want to get here via the Playground and Prompt Flow.

Prompt Flow

Prompt Flow is the place you’d go from the Playground to enhance your model into an app with RAG, content filters, embedding, code, custom voice output, and fine-tuning. If you look at the files at the upper right of the screen, you’ll see Jinja, YAML, and text files that define the prompt, the flow of execution, and any requirements you want to add. (Jinja is an open source web template engine for the Python programming language. YAML is a data serialization language used for configuration files.)

The Prompt Flow screen in Azure AI Studio is your easy entry into heavy-duty AI app engineering. Prompt Flow is also available separately as an open-source project on GitHub, with its own SDK and Visual Studio Code extension.

azure ai studio 13 IDG

Azure AI Studio’s Prompt Flow tool. I got here from the Playground, with the Shakespearean writing assistant sample prompt open. From here you can enhance the app to use data, use a vector index, do fine-tuning, do evaluation, and deploy the model.

Custom Neural Voice

Custom Neural Voice is a limited-access platform (you have to apply for permission to use it) that allows you to create a new AI voice for your application. You can design your unique voice persona and efficiently manage voice talents, data sets, models, test runs, and endpoint connections.

Fine-tuning

In the preview period, which is in effect as of this writing, you can only fine-tune Llama 2 models with this tool, and it’s only supported in projects located in the West US 3 region.

Data

You can connect Azure AI Studio to data in Azure Blob Storage, Azure Data Lake Storage Gen 2, or Microsoft OneLake. Data can be in a single file or a folder. You can also upload data files.

You can use your own data to implement RAG (see the “Model customization methods” section above) to ground your model as long as it isn’t too long. The total data length needs to be smaller than the model’s context size, otherwise you’ll need to use an embedding and a vector search index.

In addition, you can use image files (up to 16 MB each) for GPT-4 Turbo with Vision, from the Playground. Putting the images in a Blob Storage or Data Lake folder lets you give the model a URL and avoid uploading the images individually to the Playground.

Indexes

Vector indexes using embeddings and Azure AI Search (vector search) make finding relevant data more efficient, and avoid the context length problem when implementing RAG. You can connect to the data in Azure Blob Storage, Azure Data Lake Storage Gen 2, or Microsoft OneLake when you create your index, or use data you’ve already uploaded in the data section.

Deployments

Azure AI Studio supports deploying large language models, flows, and web apps. You can deploy models as a service (MaaS), or models as a platform (MaaP), as discussed above.

Flows are generative AI apps consisting of a sequence of tools, including models, your own data, and possibly embeddings, vector database lookup, and custom connections. When you deploy a flow, you create an endpoint for an AI service. You can also deploy a web app that uses your AI service.

Content Filters

This area lets you list and manage the content filters you use to sanitize model input and output, as discussed in the “Responsible AI” section above.

Your resources

This area, under the Manage tab, lists the permissions, compute instances, connections, policies, and billing for each of your AI projects.

Quotas

Quotas for the different models and instance sizes available are currently viewable and manageable under the Manage tab in the staging version of the Azure AI Studio preview. I don’t see them in my production subscriptions, although they are available when selecting and deploying models.

Azure AI Studio quickstarts and tutorials

The number of quickstarts and tutorials in the Azure AI Studio documentation will undoubtedly grow over time. At the time of writing there are four quickstarts:

And there are three tutorials:

Yes, Azure AI Studio was designed to be usable by the blind. 

AI development without pain

Azure AI Studio, while still in preview and somewhat under construction, checks most of the boxes for a generative AI application builder. It’s clearly making progress, based on my peek at a staging site for the product, and also on the new features that dropped while I was working on my review.

You can build generative AI web apps using Azure AI Studio without having to write code. If you can write Python, all the better. I like the way the Playground and the Prompt Flow tools work.

As I mentioned in the introduction, you can accomplish many of the same things using competing products from Amazon (Bedrock) and Google (Generative AI Studio). If you can program, you can also accomplish many of the same things using LangChain and LangSmith.

But Azure AI Studio would be a good choice. It should allow you to build your AI apps efficiently with minimal pain, whether or not you write code, as long as you understand the principles of prompt engineering, embedding, RAG, and prompt flows.

Cost: Depends on model usage and size of instances deployed

Platform: Microsoft Azure cloud

Next read this:

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post

Archives

Facebook

Twitter

Subscribe for exclusive Deals




Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue