UCL

Feb 8, 2025 ∞

UCL: Some Updates

Made a few minor changes to UCL. Well, actually, I made one large change. I’ve renamed the foreach builtin to for.

I was originally planning to have a for loop that worked much like other languages: you have a variable, a start value, and an end value, and you’d just iterate over the loop until you reach the end. I don’t know how this would’ve looked, but I imagined something like this:

for x 0 10 {
    echo $x
}
# numbers 0..9 would be printed.

But this became redundant after adding the seq builtin:

foreach (seq 10) { |x|
    echo $x
}

This was in addition to all the other useful things you could do with the foreach loop¹, such as loop over lists and hashes, and consume values from iterators. It’s already a pretty versatile loop. So I elected to go the Python way and just made it so that the for loop is the loop to use to iterate over collections.

This left an opening for a loop that dealt with guards, so I also added the while loop. Again, much like most languages, this loop would iterate over a block until the guard becomes false:

set x 0
while (lt $x 5) {
    echo $x
    set x (add $x 1)
}
echo "done"

Unlike the for loop, this is unusable in a pipeline (well, unless it’s the first component). I was considering having the loop return the result of the guard when it terminates, but I realised that would be either false, nil, or anything else that was “falsy.” So I just have the loop return nil. That said, you can break from this loop, and if the break call had a value, that would be used as the result of the loop:

set x 0
while (lt $x 5) {
    set x (add $x 1)
    if (ge $x 3) {
        break "Ahh"
    }
} | echo " was the break"

The guard is optional, and if left out, the while loop will iterate for ever.

The Set! Builtin

Many of these changes come from using of UCL for my job, and one thing I found myself doing recently is writing a bunch of migration scripts. This needed to get data from a database, which may or may not be present. If it’s not, I want the script to fail immediately so I can check my assumptions. This usually results in constructs like the following:

set planID (ls-plans | first { |p| eq $p "Plan Name" } | index ID)
if (not $planID) {
    error "cannot find plan"
}

And yeah, adding the if block is fine — I do it all the time when writing Go — but it would be nice to assert this when you’re trying to set the variable, for no reason other than the fact that you’re thinking about nullability while writing the expression to fetch the data.

So one other change I made was add the set! builtin. This will basically set the variable only if the expression is not nil. Otherwise, it will raise an error.

set! planID (ls-plans | first { |p| eq $p "Missing Plan" } | index ID)
# refusing to set! `planID` to nil value

This does mean that ! and ? are now valid characters to appear in identifiers, just like Ruby. I haven’t decided whether I want to start following the Ruby convention of question marks indicating a predicate or bangs indicating a mutation. Not sure that’s going to work out now, given that the bang is being used here to assert non-nullability. In either case, could be useful in the future.

And the seq builtin ↩︎

Jan 31, 2025 ∞

UCL: Iterators

Still working on UCL in my spare time, mainly filling out the standard library a little, like adding utility functions for lists and CSV files. Largest change made recently was the adding iterators to the mix of core types. These worked a lot like the streams of old, where you had a potentially unbounded source of values that could only be consumed one at a time. The difference with streams is that there is not magic to this: iterators work like any other type, so they could be stored in variables, passed around methods, etc (streams could only be consumed via pipes).

I augmented the existing high-level functions like map and filter to consume and produce iterators, but it was fun discovering other functions which also became useful. For example, there exists a head function which returned the first value of a list. But I discovered that the semantics also worked as a way to consume the next element from an iterator. So that’s what this function now does. This, mixed with the fact that iterators are truthy if they’re got at least one pending value, means that some of the builtins could now be implemented in UCL itself. Like the example below, which could potentially be used to reimplement itrs:to-list (this is a contrived example, as foreach would probably work better here).

proc to-list { |itr lst|
   if $itr {
      lists:add $lst (head $itr)
      return (to-list $itr $lst)
   }
   return $lst
}

to-list (itrs:from [1 2 3 4 5]) []

But the biggest advantage that comes from iterators is querying large data-stores with millions of rows. Being able to write a UCL script which sets up a pipeline of maps and filters and just let it churn through all the data in it’s own time is the dream.

list-customers | filter { |x| $x.HasPlan } | map { |x| $x.PlanID } | foreach echo

I’ve got a need for this in the internal backend tool that spurred the development of UCL, and I’m looking forward to using iterators to help here.

Jan 19, 2025 ∞

Started filling out the UCL website, mainly by documenting the core modules. It might be a little unnecessary to have a full website for this, given that the only person who’ll get any use from it right now will be myself. But who knows how useful it could be in the future? If nothing else, it’s a showcase on what I’ve been working on for this project.

Jan 15, 2025 ∞

I’ve been using UCL a lot recently, which is driving additional development on it. Spent a fair bit of time this evening fixing bugs and adding small features like string interpolation. Fix a number of grammar bugs too, that only popped up when I started writing multi-line scripts with it.

Dec 12, 2024 ∞

I plan to integrate UCL into another tool at work, so I spent last night improving it’s use as a REPL. Added support for onboard help and setting up custom type printing, which is useful for displaying tables of data. I started working on the tool today and it’s already feeling great.

A command line interface is displayed, showing help-related commands, usage, arguments, and details.

Oct 21, 2024 ∞

Try-Catch In UCL - Some Notes

Stared working on a try command to UCL, which can be used to trap errors that occur within a block. This is very much inspired by try-blocks in Java and Python, where the main block will run, and if any error occurs, it will fall through to the catch block:

try {
  echo "Something bad can happen here"
} catch {
  echo "It's all right. I'll run next"
}

This is all I’ve got working at the moment, but I want to quickly write some notes on how I’d like this to work, lest I forget it later.

First, much like everything in UCL, these blocks should return a value. So it should be possible to do something like this:

set myResults (try {
  result-of-something-that-can-fail
} catch {
  "My default"
})
--> (result of the thing)

This is kind of like using or in Lua to fallback to a default, just that if the result fails with an error, the default value can be returned from the catch block. In might even be possible to simply this further, and have catch just return a value in cases where an actual block of code is unnecessary:

set myResults (try { result-of-something-that-can-fail } catch "My default")

One other thing to consider is how to represent the error. Errors are just treated out-of-band at the moment, and are represented as regular Go error types. It might be necessary to add a new error type to UCL, so that it can be passed through to the catch block for logging or switching:

try {
  do-something
} catch { |e|
  echo (cat "The error is " $e)
}

This could also be used as the return value if there is no catch block:

set myResult (try { error "my error" })
--> error: my error

Another idea I have is successive catch blocks, that would cascade one after the other if the one before it fails:

try {
  do-something
} catch {
  this-may-fail-also
} catch {
  echo "Always passes"
}

Unlike JavaScript or Python, I don’t think the idea of having catch blocks switching based on the error type would be suitable here. UCL is dynamic in nature, and having this static type checking feels a little wrong here. The catch blocks will only act as isolated blocks of execution, where an error would be caught and handled.

Finally, there’s finally, which would run regardless of which try or catch block was executed. I think, unlike the other two blocks, that the return value of a finally block will always be swallowed. I think this will work as the finally block should mainly be used for clean-up, and it’s the result of the try or catch blocks that are more important.

set res (try {
  "try"
} catch {
  "catch"
} finally {  
  "finally"
})
--> "try"

Anyway, this is the idea I have right now.

Update — I just realised that the idea of the last successful try block return an error, rather than letting it climb up the stack defeats the purpose of exceptions. So having something like the following:

try { error "this will fail" }

Should just unroll the stack and not return an error value. Although if there is a need to have an error value returned, then the following should work:

try { error "this will fail" } catch { |err| $err }
--> error: this will fail

May 10, 2024 ∞

Indexing In UCL

I’ve been thinking a little about how to support indexing in UCL, as in getting elements from a list or keyed values from a map. There already exists an index builtin that does this, but I’m wondering if this can be, or even should be, supported in the language itself.

I’ve reserved . for this, and it’ll be relatively easy to make use of it to get map fields. But I do have some concerns with supporting list element dereferencing using square brackets. The big one being that if I were to use square brackets the same way that many other languages do, I suspect (although I haven’t confirmed) that it could lead to the parser treating them as two separate list literals. This is because the scanner ignores whitespace, and there’s no other syntactic indicators to separate arguments to proc calls, like commas:

echo $x[4]      --> echo $x [4]
echo [1 2 3][2] --> echo [1 2 3] [2]

So I’m not sure what to do here. I’d like to add support for . for map fields but it feels strange doing that just that and having nothing for list elements.

I can think of three ways to address this.

Do Nothing — the first option is easy: don’t add any new syntax to the language and just rely on the index builtin. TCL does with lindex, as does Lisp with nth, so I’ll be in good company here.

Use Only The Dot — the second option is to add support for the dot and not the square brackets. This is what the Go templating language does for keys of maps or structs fields. They also have an index builtin too, which will work with slice elements.

I’d probably do something similar but I may extend it to support index elements. Getting the value of a field would be what you’d expect, but to get the element of a list, the construct .(x) can be used:

echo $x.hello     \# returns the "hello" field
echo $x.(4)       \# returns the forth element of a list

One benefit of this could be that the .(x) construct would itself be a pipeline, meaning that string and calculated values could be used as well:

echo $x.("hello")
echo $x.($key)
echo $x.([1 2 3] | len)
echo $x.("hello" | toUpper)

I can probably get away with supporting this without changing the scanner or compromising the language design too much. It would be nice to add support for ditching the dot completely when using the parenthesis, a.la. BASIC, but I’d probably run into the same issues as with the square brackets if I did, so I think that’s out.

Use Parenthesis To Be Explicit — the last option is to use square brackets, and modify the grammar slightly to only allow the use of suffix expansion within parenthesis. That way, if you’d want to pass a list element as an argument, you have to use parenthesis:

echo ($x[4])       \# forth element of $x
echo $x[4]         \# $x, along with a list containing "4"

This is what you’d see in more functional languages like Elm and I think Haskell. I’ll have see whether this could work with changes to the scanner and parser if I were to go with this option. I think it may be achievable, although I’m not sure how.

An alternative way might be to go the other way, and modify the grammar rules so that the square brackets would bind closer to the list, which would mean that separate arguments involving square brackets would need to be in parenthesis:

echo $x[4]         \# forth element of $x
echo $x ([4])      \# $x, along with a list containing "4"

Or I could modify the scanner to recognise whitespace characters and use that as a guide to determine whether square brackets following a value. At least one space means the square bracket represent a element suffix, and zero mean two separate values.

So that’s where I am at the moment. I guess it all comes down to what works best for the language as whole. I can live with option one but it would be nice to have the syntax. I rather not go with option three as I’d like to keep the parser simple (I rather not add to all the new-line complexities I’ve have already).

Option two would probably be the least compromising to the design as a whole, even if the aesthetics are a bit strange. I can probably get use to them though, and I do like the idea of index elements being pipelines themselves. I may give option two a try, and see how it goes.

Anyway, more on this later.

May 4, 2024 ∞

UCL: Brief Integration Update and Modules

A brief update of where I am with UCL and integrating it into Dynamo-browse. I did managed to get it integrated, and it’s now serving as the interpreter of commands entered in during a session.

It works… okay. I decided to avoid all the complexities I mentioned in the last post — all that about continuations, etc. — and simply kept the commands returning tea.Msg values. The original idea was to have the commands return usable values if they were invoked in a non-interactive manner. For example, the table command invoked in an interactive session will bring up the table picker for the user to select the table. But when invoked as part of a call to another command, maybe it would return the current table name as a string, or something.

But I decided to ignore all that and simply kept the commands as they are. Maybe I’ll add support for this in a few commands down the line? We’ll see. I guess it depends on whether it’s necessary.

Which brings me up to why this is only working “okay” at the moment. Some commands return a tea.Msg which ask for some input from the user. The table command is one; another is set-attr, which prompts the user to enter an attribute value. These are implemented as a message which commands the UI to go into an “input mode”, and will invoke a callback on the message when the input is entered.

This is not an issue for single commands, but it becomes one when you start entering multiple commands that prompt for input, such as two set-attr calls:

set-attr this -S ; set-attr that -S

What happens is that two messages to show the prompt are sent, but only one of them is shown to the user, while the other is simply swallowed.

Fixing this would require some re-engineering, either with how the controllers returning these messages work, or the command handlers themselves. I can probably live with this limitation for now — other than this, the UCL integration is working well — but I may need to revisit this down the line.

Modules

As for UCL itself, I’ve started working on the builtins. I’m planning to have a small set of core builtins for the most common stuff, and the rest implemented in the form of “modules”. The idea is that the core will most likely be available all the time, but the modules can be turned on and off by the language embedder based on what they need or are comfortable having.

Each module is namespaces with a prefix, such as os for operating system operations, or fs for file-system operations. I’ve chosen the colon as the namespace separator, mainly so I can reserve the dot for field dereferencing, but also because I think TCL uses the colon as a namespace separator as well (I think I saw it in some sample code). The first implementation of this was simply adding the colon to the list of characters that make up the IDENT token. This broke the parser as the colon is also use as the map key/value separator, and the parser couldn’t resolve maps anymore. I had to extend the “indent” parse rule to support multiple IDENT tokens separated by colons. The module builtins are simply added to the environment with there fully-qualified name, complete prefix and colon, and invoking them with one of these idents will just “flatten” all these colon-separated tokens into a single string. Not sophisticated, but it’ll work for now.

There aren’t many builtins for these modules at the moment: just a few for reading environment variables and getting files as list of strings. Dynamo-browse is already using this in a feature branch, and it’s allows me to finally add a long-standing feature I’ve been meaning to add for a while: automatically enabling read-only mode when accessing DynamoDB tables in production. With modules, this construct looks a little like the following:

if (eq (os:env "ENV") "prod") {
    set-opt ro
}

It would’ve been possible to do this with the scripting language already used by Dynamo-browse. But this is the motivation of integrating UCL: it makes these sorts of constructs much easier to do, almost as one would do writing a shell-script over something in C.

May 1, 2024 ∞

UCL: Breaking And Continuation

I’ve started trying to integrate UCL into a second tool: Dynamo Browse. And so far it’s proving to be a little difficult. The problem is that this will be replacing a dumb string splitter, with command handlers that are currently returning a tea.Msg type that change the UI in some way.

UCL builtin handlers return a interface{} result, or an error result, so there’s no reason why this wouldn’t work. But tea.Msg is also an interface{} types, so it will be difficult to tell a UI message apart from a result that’s usable as data.

This is a Dynamo Browse problem, but it’s still a problem I’ll need to solve. It might be that I’ll need to return tea.Cmd types — which are functions returning tea.Msg — and have the UCL caller detect these and dispatch them when they’re returned. That’s a lot of function closures, but it might be the only way around this (well, the alternative is returning an interface type with a method that returns a tea.Msg, but that’ll mean a lot more types than I currently have).

Anyway, more on this in the future I’m sure.

Break, Continue, Return

As for language features, I realised that I never had anything to exit early from a loop or proc. So I added break, continue, and return commands. They’re pretty much what you’d expect, except that break can optionally return a value, which will be used as the resulting value of the foreach loop that contains it:

echo (foreach [5 4 3 2 1] { |n|
  echo $n
  if (eq $n 3) {
    break "abort"
  }
})
--> 5
--> 4
--> 3
--> abort

These are implemented as error types under the hood. For example, break will return an errBreak type, which will flow up the chain until it is handled by the foreach command (continue is also an errBreak with a flag indicating that it’s a continue). Similarly, return will return an errReturn type that is handled by the proc object.

This fits quite naturally with how the scripts are run. All I’m doing is walking the tree, calling each AST node as a separate function call and expecting it to return a result or an error. If an error is return, the function bails, effectively unrolling the stack until the error is handled or it’s returned as part of the call to Eval(). So leveraging this stack unroll process already in place makes sense to me.

I’m not sure if this is considered idiomatic Go. I get the impression that using error types to handle flow control outside of adverse conditions is frowned upon. This reminds me of all the arguments against using expressions for flow control in Java. Those arguments are good ones: following executions between try and catch makes little sense when the flow can be explained more clearly with an if.

But I’m going to defend my use of errors here. Like most Go projects, the code is already littered with all the if err != nil { return err } to exit early when a non-nil error is returned. And since Go developers preach the idea of errors simply being values, why not use errors here to unroll the stack? It’s better than the alternatives: such as detecting a sentinel result type or adding a third return value which will just be yet another if bla { return res } clause.

Continuations

Now, an idea is brewing for a feature I’m calling “continuations” that might be quite difficult to implement. I’d like to provide a way for a user builtin to take a snapshot of the call stack, and resume execution from that point at a later time.

The reason for this is that I’d like all the asynchronous operations to be transparent to the UCL user. Consider a UCL script with a sleep command:

echo "Wait here"
sleep 5
echo "Ok, ready"

sleep could simply be a call to time.Sleep() but say you’re running this as part of an event loop, and you’d prefer to do something like setup a timer instead of blocking the thread. You may want to hide this from the UCL script author, so they don’t need to worry about callbacks.

Ideally, this can be implemented by the builtin using a construct similar to the following:

func sleep(ctx context.Context, arg ucl.CallArgs) (any, error) {
  var secs int
  if err := arg.Bind(&secs); err != nil {
    return err
  }

  // Save the execution stack
  continuation := args.Continuation()

  // Schedule the sleep callback
  go func() {
    <- time.After(secs * time.Seconds)

    // Resume execution later, yielding `secs` as the return value
    // of the `sleep` call. This will run the "ok, ready" echo call
    continuation(ctx, secs)
  })()

  // Halt execution now
  return nil, ucl.ErrHalt
}

The only trouble is, I’ve got no idea how I’m going to do this. As mentioned above, UCL executes the script by walking the parse tree with normal Go function calls. I don’t want to be in a position to create a snapshot of the Go call stack. That a little too low level for what I want to achieve here.

I suppose I could store the visited nodes in a list when the ErrHalt is raised; or maybe replace the Go call stack with an in memory stack, with AST node handlers being pushed and popped as the script runs. But I’m not sure this will work either. It would require a significant amount of reengineering, which I’m sure will be technically interesting, but will take a fair bit of time. And how is this to work if a continuation is made in a builtin that’s being called from another builtin? What should happen if I were to run sleep within a map, for example?

So it might be that I’ll have to use something else here. I could potentially do something using Goroutines: the script is executed on Goroutine and args.Continuation() does something like pauses it on a channel. How that would work with a builtin handler requesting the continuation not being paused themselves I’m not so sure. Maybe the handlers could be dispatched on a separate Goroutine as well?

A simpler approach might be to just offload this to the UCL user, and have them run Eval on a separate Goroutine and simply sleeping the thread. Callbacks that need input from outside could simply be sent using channels passed via the context.Context. At least that’ll lean into Go’s first party support for synchronisation, which is arguably a good thing.

Apr 26, 2024 ∞

UCL: The Simplifications Paid Off

The UCL simplifications have been implemented, and they seem to be largely successful.

Ripped out all the streaming types, and changed pipes to simply pass the result of the left command as first argument of the right.

"Hello" | echo ", world"
--> "Hello, world"

This has dramatically improved the use of pipes. Previously, pipes could only be used to connect streams. But now, with pretty much anything flowing through a pipe, that list of commands has extended to pretty much every builtins and user-defined procs. Furthermore, a command no longer needs to know that it’s being used in a pipeline: whatever flows through the pipe is passed transparently via the first argument to the function call. This has made pipes more useful, and usable in more situations.

Macros can still know whether there exist a pipe argument, which can make for some interesting constructs. Consider this variant of the foreach macro, which can “hang off” the end of a pipe:

["1" "2" "3"] | foreach { |x| echo $x }
--> 1
--> 2
--> 3

Not sure if this variant is useful, but I think it could be. It seems like a natural way to iterate items passed through the pipe. I’m wondering if this could extend to the if macro as well, but that variant might not be as natural to read.

Another simplification was changing the map builtin to accept anonymous blocks, as well as an “invokable” commands by name. Naturally, this also works with pipes too:

[a b c] | map { |x| toUpper $x }
--> [A B C]

[a b c] | map toUpper
--> [A B C]

As for other language features, I finally got around to adding support for integer literals. They look pretty much how you expect:

set n 123
echo $n
--> 123

One side effect of this is that an identifier can no longer start with a dash followed by a digit, as that would be parsed as the start of a negative integer. This probably isn’t a huge deal, but it could affect command switches, which are essentially just identifiers that start with a dash.

Most of the other work done was behind the scenes trying to make UCL easier to embed. I added the notion of “listable” and “hashable” proxies objects, which allow the UCL user to treat a Go slice or a Go struct as a list or hash respectively, without the embedder doing anything other than return them from a function (I’ve yet to add this support to maps just yet).

A lot of the native API is still a huge mess, and I really need to tidy it up before I’d be comfortable opening the source. Given that the language is pretty featureful now to be useful, I’ll probably start working on this next. Plus adding builtins. Really need to start adding useful builtins.

Anyway, more to come on this topic I’m sure.

Oh, one last thing: I’ve put together an online playground where you can try the language out in the browser. It’s basically a WASM build of the language running in a JavaScript terminal emulator. It was a little bit of a rush job and there’s no reason for building this other than it being a fun little thing to do.

You can try it out here, if you’re curious.

Apr 23, 2024 ∞

Simplifying UCL

I’ve been using UCL for several days now in that work tool I mentioned, and I’m wondering if the technical challenge that comes of making a featureful language is crowding out what I set out to do: making a useful command language that is easy to embed.

So I’m thinking of making some simplifications.

The first is to expand the possible use of pipes. To date, the only thing that can travel through pipes are streams. But many of the commands I’ve been adding simply return slices. This is probably because there’s currently no “stream” type available to the embedder, but even if there was, I’m wondering if it make sense to allow the embedder to pass slices, and other types, through pipes as well.

So, I think I’m going to take a page out of Go’s template book and simply have pipes act as syntactic sugar over sequential calls. The goal is to make the construct a | b essentially be the same as b (a), where the first argument of b will be the result of a.

As for streams, I’m thinking of removing them as a dedicated object type. Embedders could certainly make analogous types if they need to, and the language should support that, but the language will no longer offer first class support for them out of the box.

The second is to remove any sense of “purity” of the builtins. You may recall the indecision I had regarding using anonymous procs with the map command:

I’m not sure how I can improve this. I don’t really want to add automatic dereferencing of identities: they’re very useful as unquoted string arguments. I suppose I could add another construct that would support dereferencing, maybe by enclosing the identifier in parenthesis.

I think this is the wrong way to think of this. Again, I’m not here to design a pure implementation of the language. The language is meant to be easy to use, first and foremost, in an interactive shell, and if that means sacrificing purity for a map command that supports blocks, anonymous procs, and automatic dereferencing of commands just to make it easier for the user, then I think that’s a trade work taking.

Anyway, that’s the current thinking as of now.

Apr 19, 2024 ∞

UCL: Procs and Higher-Order Functions

More on UCL yesterday evening. Biggest change is the introduction of user functions, called “procs” (same name used in TCL):

proc greet {
    echo "Hello, world"
}

greet
--> Hello, world

Naturally, like most languages, these can accept arguments, which use the same block variable binding as the foreach loop:

proc greet { |what|
    echo "Hello, " $what
}

greet "moon"
--> Hello, moon

The name is also optional, and if omitted, will actually make the function anonymous. This allows functions to be set as variable values, and also be returned as results from other functions.

proc makeGreeter { |greeting|
    proc { |what|
        echo $greeting ", " $what
    }
}

set helloGreater (makeGreeter "Hello")
call $helloGreater "world"
--> Hello, world

set goodbye (makeGreeter "Goodbye cruel")
call $goodbye "world"
--> Goodbye cruel, world

I’ve added procs as a separate object type. At first glance, this may seem a little unnecessary. After all, aren’t blocks already a specific object type?

Well, yes, that’s true, but there are some differences between a proc and a regular block. The big one being that the proc will have a defined scope. Blocks adapt to the scope to which they’re invoked whereas a proc will close over and include the scope to which it was defined, a lot like closures in other languages.

It’s not a perfect implementation at this stage, since the set command only sets variables within the immediate scope. This means that modifying closed over variables is currently not supported:

\# This currently won't work
proc makeSetter {
    set bla "Hello, "
    proc appendToBla { |x|
        set bla (cat $bla $x)
        echo $bla
    }
}

set er (makeSetter)
call $er "world"
\# should be "Hello, world"

Higher-Order Functions

The next bit of work is finding out how best to invoke these procs in higher-order functions. There are some challenges here that deal with the language grammar.

Invoking a proc by name is fine, but since the grammar required the first token to be a command name, there was no way to invoke a proc stored in a variable. I quickly added a new call command — which takes the proc as the first argument — to work around it, but after a while, this got a little unwieldy to use (you can see it in the code sample above).

So I decided to modify the grammar to allow any arbitrary value to be the first token. If it’s a variable that is bound to something “invokable” (i.e. a proc), and there exist at-least one other argument, it will be invoked. So the above can be written as follows:

set helloGreater (makeGreeter "Hello")
$helloGreater "world"
--> Hello, world

At-least one argument is required, otherwise the value will simply be returned. This is so that the value of variables and literal can be returned as is, but that does mean lambdas will simply be dereferenced:

"just, this"
--> just, this

set foo "bar"
$foo
--> bar

set bam (proc { echo "BAM!" })
$bam
--> (proc)

To get around this, I’ve added the notion of the “empty sub”, which is just the construct (). It evaluates to nil, and since a function ignores any extra arguments not bound to variables, it allows for calling a lambda that takes no arguments:

set bam (proc { echo "BAM!" })
$bam ()
--> BAM!

It does allow for other niceties, such as using a falsey value:

if () { echo "True" } else { echo "False" }
--> False

With lambdas now in place, I’m hoping to work on some higher order functions. I’ve started working on map which accepts both a list or a stream. It’s a buggy mess at the moment, but some basic constructs currently work:

map ["a" "b" "c"] (proc { |x| toUpper $x }) 
--> stream ["A" "B" "C"]

(Oh, by the way, when setting a variable to a stream using set, it will now collect the items as a list. Or at least that’s the idea. It’s currently not working at the moment.)

A more refined approach would be to treat commands as lambdas. The grammar supports this, but the evaluator doesn’t. For example, you cannot write the following:

\# won't work
map ["a" "b" "c"] toUpper

This is because makeUpper will be treated as a string, and not a reference to an invokable command. It will work for variables. You can do this:

set makeUpper (proc { |x| toUpper $x })
map ["a" "b" "c"] $makeUpper

I’m not sure how I can improve this. I don’t really want to add automatic dereferencing of identities: they’re very useful as unquoted string arguments. I suppose I could add another construct that would support dereferencing, maybe by enclosing the identifier in parenthesis:

\# might work?
map ["a" "b" "c"] (toUpper)

Anyway, more on this in the future I’m sure.

Apr 18, 2024 ∞

UCL: First Embed, and Optional Arguments

Came up with a name: Universal Control Language: UCL. See, you have TCL; but what if instead of being used for tools, it can be more universal? Sounds so much more… universal, am I right? 😀

Yeah, okay. It’s not a great name. But it’ll do for now.

Anyway, I’ve started integrating this language with the admin tool I’m using at work. This tool I use is the impetus for this whole endeavour. Up until now, this tool was just a standard CLI command usable from the shell. But it’s not uncommon for me to have to invoke the tool multiple times in quick succession, and each time I invoke it, it needs to connect to backend systems, which can take a few seconds. Hence the reason why I’m converting it into a REPL.

Anyway, I added UCL to the tool, along with a readline library, and wow, did it feel good to use. So much better than the simple quote-aware string splitter I’d would’ve used. And just after I added it, I got a flurry of requests from my boss to gather some information, and although the language couldn’t quite handle the task due to missing or unfinished features, I can definitely see the potential there.

I’m trying my best to only use what will eventually be the public API to add the tool-specific bindings. The biggest issue is that these “user bindings” (i.e. the non-builtins) desperately need support for producing and consuming streams. They’re currently producing Go slices, which are being passed around as opaque “proxy objects”, but these can’t be piped into other commands to, say, filter or map. Some other major limitations:

No commands to actually filter or map. In fact, the whole standard library needs to be built out.
No ability to get fields from hashes or lists, including proxy objects which can act as lists or hashes.

One last thing that would be nice is the ability to define optional arguments. I actually started work on that last night, seeing that it’s relatively easy to build. I’m opting for a style that looks like the switches you’d find on the command line, with option names starting with dashes:

join "a" "b" -separator "," -reverse
--> b, a

Each option can have zero or more arguments, and boolean options can be represented as just having the switch. This does mean that they’d have to come after the positional arguments, but I think I can live with that. Oh, and no syntactic sugar for single-character options: each option must be separated by whitespace (the grammar actually treats them as identifiers). In fact, I’d like to discourage the use of single-character option names for these: I prefer the clarity that comes from having the name written out in full (that said, I wouldn’t rule out support for aliases). This eliminates the need for double dashes, to distinguish long option names from a cluster of single-character options, so only the single dash will be used.

I’ll talk more about how the Go bindings look later, after I’ve used them a little more and they’re a little more refined.

Apr 17, 2024 ∞

Tool Command Language: Lists, Hashs, and Loops

A bit more on TCL (yes, yes, I’ve gotta change the name) last night. Added both lists and hashes to the language. These can be created using a literal syntax, which looks pretty much looks how I described it a few days ago:

set list ["a" "b" "c"]
set hash ["a":"1" "b":"2" "c":"3"]

I had a bit of trouble working out the grammar for this, I first went with something that looked a little like the following, where the key of an element is optional but the value is mandatory:

list_or_hash  --> "[" "]"        \# empty list
                | "[" ":" "]"    \# empty hash
                | "[" elems "]"  \# elements

elems --> ((arg ":")? arg)*      \# elements of a list or hash

arg --> <anything that can be a command argument>

But I think this confused the parser a little, where it was greedily consuming the key arg and expecting the : to be present to consume the value.

So I flipped it around, and now the “value” is the optional part:

elems --> (arg (":" arg)?)*

So far this seems to work. I renamed the two fields “left” and “right”, instead of key and value. Now a list element will use the “left” part, and a hash element will use “left” for the key and “right” for the value.

You can probably guess that the list and hash are sharing the same AST types. This technically means that hybrid lists are supported, at least in the grammar. But I’m making sure that the evaluator throws an error when a hybrid is detected. I prefer to be strict here, as I don’t want to think about how best to support it. Better to just say either a “pure” list, or a “pure” hybrid.

Well, now that we have collections, we need some way to iterate over them. For that, I’ve added a foreach loop, which looks a bit like the following:

\# Over lists
foreach ["a" "b" "c"] { |elem|
  echo $elem
}

\# Over hashes
foreach ["a":"1" "b":"2"] { |key val|
  echo $key " = " $val
}

What I like about this is that, much like the if statement, it’s implemented as a macro. It takes a value to iterate over, and a block with bindable variables: one for list elements, or two for hash keys and values. This does mean that, unlike most other languages, the loop variable appears within the block, rather than to the left of the element, but after getting use to this form of block from my Ruby days, I can get use to it.

One fun thing about hashes is that they’re implemented using Go’s map type. This means that the iteration order is random, by design. This does make testing a little difficult (I’ve only got one at the moment, which features a hash of length one) but I rarely depend on the order of hash keys so I’m happy to keep it as is.

This loop is only the barest of bones at the moment. It doesn’t support flow control like break or continue, and it also needs to support streams (I’m considering a version with just the block that will accept the stream from a pipe). But I think it’s a reasonably good start.

I also spend some time today integrating this language in the tool I was building it for. I won’t talk about it here, but already it’s showing quite a bit of promise. I think, once the features are fully baked, that this would be a nice command language to keep in my tool-chest. But more of that in a later post.

Apr 13, 2024 ∞

Tool Command Language: Macros And Blocks

More work on the tool command language (of which I need to come up with a name: I can’t use the abbreviation TCL), this time working on getting multi-line statement blocks working. As in:

echo "Here"
echo "There"

I got a little wrapped up about how I can configure the parser to recognise new-lines as statement separators. I tried this in the past with a hand rolled lexer and ended up peppering NL tokens all around the grammar. I was fearing that I needed to do something like this here. After a bit of experimentation, I think I’ve come up with a way to recognise new-lines as statement separators without making the grammar too messy. The unit tests verifying this so far seem to work.

// Excerpt of the grammar showing all the 'NL' token matches.
// These match a new-line, plus any whitespace afterwards.

type astStatements struct {
    First *astPipeline   `parser:"@@"`
    Rest  []*astPipeline `parser:"( NL+ @@ )*"`
}

type astBlock struct {
    Statements []*astStatements `parser:"LC NL? @@ NL? RC"`
}

type astScript struct {
    Statements *astStatements `parser:"NL* @@ NL*"`
}

I’m still using a stateful lexer as it may come in handy when it comes to string interpolation. Not sure if I’ll add this, but I’d like the option.

Another big addition today was macros. These are much like commands, but instead of arguments being evaluated before being passed through to the command, they’re deferred and the command can explicitly request their evaluation whenever. I think Lisp has something similar: this is not that novel.

This was used to implement the if command, which is now working:

set x "true"
if $x {
  echo "Is true"
} else {
  echo "Is not true"
}

Of course, there are actually no operators yet, so it doesn’t really do much at the moment.

This spurred the need for blocks. which is a third large addition made today. They’re just a group of statements that are wrapped in an object type. They’re “invokable” in that the statements can be executed and produce a result, but they’re also a value that can be passed around. It jells nicely with the macro approach.

Must say that I like the idea of using macros for things like if over baking it into the language. It can only add to the “embed-ability” of this, which is what I’m looking for.

Finally, I did see something interesting in the tests. I was trying the following test:

echo "Hello"
echo "World"

And I was expecting a Hello and World to be returned over two lines. But only World was being returning. Of course! Since echo is actually producing a stream and not printing anything to stdout, it would only return World.

I decided to change this. If I want to use echo to display a message, then the above script should display both Hello and World in some manner. The downside is that I don’t think I’ll be able to support constructs like this, where echo provides a source for a pipeline:

\# This can't work anymore
echo "Hello" | toUpper

I mean, I could probably detect whether echo is connected to a pipe (the parser can give that information). But what about other commands that output something? Would they need to be treated similarly?

I think it’s probably best to leave this out for now, and have a new construct for providing literals like this to a pipe. Heck, maybe just having the string itself would be enough:

"hello" | toUpper

Anyway, that’s all for today.

Apr 12, 2024 ∞

Tool Command Language

I have this idea for a tool command language. Something similar to TCL, in that it’s chiefly designed to be used as an embedded scripting language and chiefly in an interactive context.

It’s been an idea I’ve been having in my mind for a while, but I’ve got the perfect use case for it. I’ve got a tool at work I use to do occasional admin tasks. At the moment it’s implemented as a CLI tool, and it works. But the biggest downside is that it needs to form connections to the cluster to call internal service methods, and it always take a few seconds to do so. I’d like to be able to use it to automate certain actions, but this delay would make doing so a real hassle.

Some other properties that I’m thinking off:

It should be able to support structured data, similar to how Lisp works
It should be able to support something similar to pipes, similar to how the shell and Go’s template language works.

Some of the trade-offs that come of it:

It doesn’t have to be fast. In fact, it can be slow so long as the work embedding and operating it can be fast.
It may not be completely featureful. I’ll go over the features I’m thinking of below, but I say upfront that you’re not going to be building any cloud services with this. Administering cloud servers, maybe; but leave the real programs to a real language.

Some Notes On The Design

The basic concept is the statement. A statement consists of a command, and zero or more arguments. If you’ve used a shell before, then you can imagine how this’ll look:

firstarg "hello, world"
--> hello, world

Each statement produces a result. Here, the theoretical firstarg will return the first argument it receives, which will be the string "hello, world"

Statements are separated by new-lines or semicolons. In such a sequence, the return value of the last argument is returned:

firstarg "hello" ; firstarg "world"
--> world

I’m hoping to have a similar approach to how Go works, in that semicolons will be needed if multiple statements share a line, but will otherwise be unnecessary. I’m using the Participal parser library for this, and I’ll need to know how I can configure the scanner to do this (or even if using the scanner is the right way to go).

The return value of statements can be used as the arguments of other statements by wrapping them in parenthesis:

echo (firstarg "hello") " world"
--> hello world

This is taken directly from TCL, except that TCL uses the square brackets. I’m reserving the square brackets for data structures, but the parenthesis are free. It also gives it a bit of a Lisp feel.

Pipelines

Another way for commands to consume the output of other commands is to build pipelines. This is done using the pipe | character:

echo "hello" | toUpper
--> HELLO

Pipeline sources, that is the command on the left-most side, can be either commands that produce a single result, or a command that produces a “stream”. Both are objects, and there’s nothing inherently special about a stream, other than there some handling when used as a pipeline. Streams are also designed to be consumed once.

For example, one can consider a command which can read a file and produce a stream of the contents:

cat "taleOfTwoCities.txt"
--> It was the best of times,
--> it was the worst of times,
--> …

Not every command is “pipe savvy”. For example, piping the result of a pipeline to echo will discard it:

echo "hello" | toUpper | echo "no me"
--> no me

Of course, this may differ based on how the builtins are implemented.

Variables

Variables are treated much like TCL and shell, in that referencing them is done using the dollar sign:

set name "josh"
--> "Josh"
echo "My name is " $name
--> "My name is Josh"

Not sure how streams will be handled with variables but I’m wondering if they should be condensed down to a list. I don’t like the idea of assigning a stream to a variable, as streams are only consumed once, and I feel like some confusion will come of it if I were to allow this.

Maybe I can take the Perl approach and use a different variable “context”, where you have a variable with a @ prefix which will reference a stream.

set file (cat "eg.text")

echo @file
\# Echo will consume file as a stream

echo $file
\# Echo will consume file as a list

The difference is subtle but may be useful. I’ll look out for instances where this would be used.

Attempting to reference an unset variable will result in an error. This may also change.

Other Ideas

That’s pretty much what I have at the moment. I do have some other ideas, which I’ll document below.

Structured Data Support: Think lists and hashes. This language is to be used with structured data, so I think it’s important that the language supports this natively. This is unlike TCL which principally works with strings and the notion of lists feels a bit tacked on to some extent.

Both lists and hashes are created using square brackets:

\# Lists. Not sure if they'll have commas or not
set l [1 2 3 $four (echo "5")]

\# Maps
set m [a:1 "b":2 "see":(echo "3") (echo "dee"):$four]

Blocks: Yep, containers for a groups of statements. This will be used for control flow, as well as for definition of functions:

set x 4
if (eq $x 4) {
  echo "X == 4"
} else {
  echo "X != 4"
}

foreach [1 2 3] { |x|
  echo $x
}

Here the blocks are just another object type, like strings and stream, and both if and foreach are regular commands which will accept a block as an argument. In fact, it would be theoretically possible to write an if statement this way (not sure if I’ll allow setting variables to blocks):

set thenPart {
  echo "X == 4"
}

if (eq $x 4) $thenPart

The block execution will exist in a context, which will control whether a new stack frame will be used. Here the if statement will simply use the existing frame, but a block used in a new function can push a new frame, with a new set of variables:

proc myMethod { |x|
  echo $x
}

myMethod "Hello"
--> "Hello

Also note the use of |x| at the start of the block. This is used to declare bindable variables, such as function arguments or for loop variables. This will be defined as part of the grammar, and be a property of the block.

Anyway, that’s the current idea.