Envisioning an AI-Enhanced Operating System with OpenCLI

Andrzej Czarkowski included in artificial intelligence programming

2024-01-04 1334 words 7 minutes

/posts/envisioning-an-ai-enhanced-operating-system-with-opencli/images/featured-image.png

Contents

Introduction

This post explores a novel idea. As a long-time user of ChatGPT for tasks ranging from drafting official letters to coding, I’ve been intrigued by the potential of AI in everyday technology. OpenAI’s use of the OpenAPI spec to enhance its capabilities is a prime example of the power of standardized specifications.

The Limitations of Current Mobile Interactions

We’ve been using smartphones for years, yet our interaction with them, despite advances in voice assistants like Google Assistant and Siri, remains somewhat limited. This leads to an interesting thought: what if we had a different kind of mobile operating system, one with AI at its core?

A Vision for an AI-Driven Operating System

Picture an OS where you could talk to your phone, asking it to perform tasks. The AI, knowing which AI-capable applications are installed, could execute commands, respond to queries, and even chain responses between apps. This isn’t about interacting with remote API-driven services; it’s about making the phone use local apps more effectively.

For instance, imagine asking your phone to make a restaurant reservation. The AI system could add this to your calendar, then access your banking app to analyze your spending, estimate future expenses, and transfer money to a designated card for the reservation. This is just one of countless potential scenarios.

The Need for a New Specification

The key question is: how could such an operating system understand each application’s capabilities? We already describe APIs using the OpenAPI specification and event-driven systems with AsyncAPI. However, there appears to be no equivalent for describing standalone applications. This gap led to the idea of developing a new specification.

Importantly, this new standard shouldn’t require changes to how standalone applications currently function. Consider the basic structure of “Hello World” programs in languages like C#, C++, or Java:

static void Main(string[] args)
{
    // Display the number of command line arguments.
    Console.WriteLine(args.Length);
}

#include <stdio.h>
int main(int argc, char *argv[])
{
         printf("Hello, World!\n");
         return 0;
}

class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World!"); 
    }
}

These programs share a common feature: the use of arguments (args). Additionally, stdout and stderr are standard output streams in many operating systems. Our new standard should leverage these elements.

Visualizing the AI-Driven Operating System

Before delving into the new standard, let’s visualize how such an operating system might function:

Explanation of the Diagram:

User (A): Initiates interaction by issuing commands to the system.
AI Kernel (B): Central part of the OS, equipped with AI capabilities. It interprets user commands and decides which application is best suited to handle them.
Applications 1, 2, 3 (C, D, E): These are applications that implement the new specification. The AI Kernel routes commands to these applications based on their defined capabilities.
AI Kernel - Command Response Handler (F): Once an application processes a command, it sends the output (stdout) or errors (stderr) back to this component of the AI Kernel.
Error Handling Mechanism (G): If an error is reported (through stderr), this mechanism processes and provides appropriate feedback or error messages to the user.
User (A): Finally, the user receives either the processed output or error feedback from the system.

Introducing OpenCLI

Let’s give a name to our new specification. Command-line interface is a means of interacting with a computer program by inputting lines of text (commands), so perhaps similarly to OpenAPI spec we can call it OpenCLI.

The best way of illustrating the spec is by example. Below is an OpenCLI spec file describing an application that supports multiple commands, each with its own set of flags.

opencli: 1.0.0
info:
  title: File Operations Application
  version: 1.0.0
commands:
  - name: list
    description: List files in a directory
    flags:
      - name: directory
        short: d
        description: The directory to list files from
        required: true
        schema:
          type: string
    stdout:
      description: List of files in the directory
      schema:
        type: array
        items:
          type: string
    stderr:
      description: Error information if the listing fails
      schema:
        type: object
        properties:
          errorCode:
            type: integer
          errorMessage:
            type: string
  - name: delete
    description: Delete a specific file
    flags:
      - name: filepath
        short: f
        description: The file path to delete
        required: true
        schema:
          type: string
    stdout:
      description: Confirmation message of file deletion
      schema:
        type: object
        properties:
          message:
            type: string
    stderr:
      description: Error information if the deletion fails
      schema:
        type: object
        properties:
          errorCode:
            type: integer
          errorMessage:
            type: string

What does this spec file tell us? The application titled “File Operations Application” supports 2 commands: list and delete. We use stdout to describe the successful response of each command and stderr to describe any errors that might be returned.

OpenCLI Specification: Overview and Structure

OpenCLI Version: 1.0.0

Purpose:

OpenCLI is a specification designed to standardize the interface for command-line applications. It provides a clear and structured way for developers to define how command-line tools can be invoked, including the specification of commands, flags, and expected input/output formats.

Key Components:

Metadata:

Includes general information about the specification, such as the version of OpenCLI being used, and details about the application or tool the spec describes (title, version).

Commands:

A list of commands that the application supports.
Each command is described with:
- Name: The identifier used to invoke the command.
- Description: A brief explanation of what the command does.
- Flags: Arguments or options that modify or complement the command’s behavior.
- Each flag can have:
  - Name: The long-form name of the flag.
  - Short: An optional short-form alias for the flag.
  - Description: Details about what the flag does.
  - Required: Indicates whether the flag is mandatory.
  - Schema: The data type and format of the flag’s value.

Standard Output (stdout):

Describes the expected output format when a command is successfully executed.
Includes a schema defining the structure of the output, which can vary based on the command.

Standard Error (stderr):

Defines the format for error messages and codes.
Includes a schema detailing how errors should be structured, facilitating consistent error handling across different tools.

Examples:

Provides sample instances for each command, illustrating typical usage, including example flag values, and showing both successful output and error cases.

Enhancing Applications with OpenCLI

Let’s now imagine how using the existing capabilities of applications running on different operating systems we can enhance applications by including the OpenCLI spec into them.

Application Formats Across Operating Systems:

Windows: The most common format is .exe used for application binaries.
iOS: Applications use the .ipa format.
Android: The format for Android applications is .apk (Android Package).
macOS: For macOS, applications typically come in .app format.

Integrating OpenCLI Spec:

Windows (.exe): Embedding the OpenCLI spec in a custom data section of the Portable Executable (PE) format could be a viable approach.
iOS (.ipa) and macOS (.app): The Info.plist file within app bundles could reference the OpenCLI spec.
Android (.apk): Custom metadata tags in the manifest.xml file could include OpenCLI spec references.

Once an application with an OpenCLI spec is installed in the operating system, the system reads the spec. It trains its local AI model with the new capabilities from the app’s spec.

Conclusion

We are currently on a very exciting journey with the new AI capabilities. Perhaps in a few years’ time, Apple or Google might build or enhance their operating systems with such capabilities. I think this article demonstrates that we could easily enhance existing applications with a new spec file that easily describes their capabilities. These applications could then be easily used by the AI-capable operating system.

As we look forward, I’m considering taking a spin at creating a proof of concept application in one of my next articles. While it won’t be a fully-fledged operating system, it would aim to demonstrate how the principles of OpenCLI could be applied and how such a system might operate in practice. This prototype would provide a tangible glimpse into the potential of AI-enhanced operating systems, paving the way for further exploration and development in this exciting field.

Stay tuned for this journey into the practical application of OpenCLI – it promises to be an enlightening exploration into the future of AI and operating systems!

Cheers, Andrzej