UCL
-
And the
seq
builtin ↩︎ - No commands to actually filter or map. In fact, the whole standard library needs to be built out.
- No ability to get fields from hashes or lists, including proxy objects which can act as lists or hashes.
- It should be able to support structured data, similar to how Lisp works
- It should be able to support something similar to pipes, similar to how the shell and Go’s template language works.
- It doesn’t have to be fast. In fact, it can be slow so long as the work embedding and operating it can be fast.
- It may not be completely featureful. I’ll go over the features I’m thinking of below, but I say upfront that you’re not going to be building any cloud services with this. Administering cloud servers, maybe; but leave the real programs to a real language.
UCL: Some Updates
Made a few minor changes to UCL. Well, actually, I made one large change. I’ve renamed the foreach
builtin to for
.
I was originally planning to have a for
loop that worked much like other languages: you have a variable, a start value, and an end value, and you’d just iterate over the loop until you reach the end. I don’t know how this would’ve looked, but I imagined something like this:
for x 0 10 {
echo $x
}
# numbers 0..9 would be printed.
But this became redundant after adding the seq
builtin:
foreach (seq 10) { |x|
echo $x
}
This was in addition to all the other useful things you could do with the foreach loop1, such as loop over lists and hashes, and consume values from iterators. It’s already a pretty versatile loop. So I elected to go the Python way and just made it so that the for
loop is the loop to use to iterate over collections.
This left an opening for a loop that dealt with guards, so I also added the while
loop. Again, much like most languages, this loop would iterate over a block until the guard becomes false:
set x 0
while (lt $x 5) {
echo $x
set x (add $x 1)
}
echo "done"
Unlike the for
loop, this is unusable in a pipeline (well, unless it’s the first component). I was considering having the loop return the result of the guard when it terminates, but I realised that would be either false, nil, or anything else that was “falsy.” So I just have the loop return nil. That said, you can break from this loop, and if the break call had a value, that would be used as the result of the loop:
set x 0
while (lt $x 5) {
set x (add $x 1)
if (ge $x 3) {
break "Ahh"
}
} | echo " was the break"
The guard is optional, and if left out, the while
loop will iterate for ever.
The Set! Builtin
Many of these changes come from using of UCL for my job, and one thing I found myself doing recently is writing a bunch of migration scripts. This needed to get data from a database, which may or may not be present. If it’s not, I want the script to fail immediately so I can check my assumptions. This usually results in constructs like the following:
set planID (ls-plans | first { |p| eq $p "Plan Name" } | index ID)
if (not $planID) {
error "cannot find plan"
}
And yeah, adding the if block is fine — I do it all the time when writing Go — but it would be nice to assert this when you’re trying to set the variable, for no reason other than the fact that you’re thinking about nullability while writing the expression to fetch the data.
So one other change I made was add the set!
builtin. This will basically set the variable only if the expression is not nil. Otherwise, it will raise an error.
set! planID (ls-plans | first { |p| eq $p "Missing Plan" } | index ID)
# refusing to set! `planID` to nil value
This does mean that !
and ?
are now valid characters to appear in identifiers, just like Ruby. I haven’t decided whether I want to start following the Ruby convention of question marks indicating a predicate or bangs indicating a mutation. Not sure that’s going to work out now, given that the bang is being used here to assert non-nullability. In either case, could be useful in the future.
UCL: Iterators
Still working on UCL in my spare time, mainly filling out the standard library a little, like adding utility functions for lists and CSV files. Largest change made recently was the adding iterators to the mix of core types. These worked a lot like the streams of old, where you had a potentially unbounded source of values that could only be consumed one at a time. The difference with streams is that there is not magic to this: iterators work like any other type, so they could be stored in variables, passed around methods, etc (streams could only be consumed via pipes).
I augmented the existing high-level functions like map
and filter
to consume and produce iterators, but it was fun discovering other functions which also became useful. For example, there exists a head
function which returned the first value of a list. But I discovered that the semantics also worked as a way to consume the next element from an iterator. So that’s what this function now does. This, mixed with the fact that iterators are truthy if they’re got at least one pending value, means that some of the builtins could now be implemented in UCL itself. Like the example below, which could potentially be used to reimplement itrs:to-list
(this is a contrived example, as foreach
would probably work better here).
proc to-list { |itr lst|
if $itr {
lists:add $lst (head $itr)
return (to-list $itr $lst)
}
return $lst
}
to-list (itrs:from [1 2 3 4 5]) []
But the biggest advantage that comes from iterators is querying large data-stores with millions of rows. Being able to write a UCL script which sets up a pipeline of maps
and filters
and just let it churn through all the data in it’s own time is the dream.
list-customers | filter { |x| $x.HasPlan } | map { |x| $x.PlanID } | foreach echo
I’ve got a need for this in the internal backend tool that spurred the development of UCL, and I’m looking forward to using iterators to help here.
Started filling out the UCL website, mainly by documenting the core modules. It might be a little unnecessary to have a full website for this, given that the only person who’ll get any use from it right now will be myself. But who knows how useful it could be in the future? If nothing else, it’s a showcase on what I’ve been working on for this project.
I’ve been using UCL a lot recently, which is driving additional development on it. Spent a fair bit of time this evening fixing bugs and adding small features like string interpolation. Fix a number of grammar bugs too, that only popped up when I started writing multi-line scripts with it.
I plan to integrate UCL into another tool at work, so I spent last night improving it’s use as a REPL. Added support for onboard help and setting up custom type printing, which is useful for displaying tables of data. I started working on the tool today and it’s already feeling great.

Try-Catch In UCL - Some Notes
Stared working on a try
command to UCL, which can be used to trap
errors that occur within a block. This is very much inspired by
try-blocks in Java and Python, where the main block will run, and if any
error occurs, it will fall through to the catch block:
try {
echo "Something bad can happen here"
} catch {
echo "It's all right. I'll run next"
}
This is all I’ve got working at the moment, but I want to quickly write some notes on how I’d like this to work, lest I forget it later.
First, much like everything in UCL, these blocks should return a value. So it should be possible to do something like this:
set myResults (try {
result-of-something-that-can-fail
} catch {
"My default"
})
--> (result of the thing)
This is kind of like using or
in Lua to fallback to a default, just
that if the result fails with an error, the default value can be
returned from the catch
block. In might even be possible to simply
this further, and have catch
just return a value in cases where an
actual block of code is unnecessary:
set myResults (try { result-of-something-that-can-fail } catch "My default")
One other thing to consider is how to represent the error. Errors are
just treated out-of-band at the moment, and are represented as regular
Go error
types. It might be necessary to add a new error
type to
UCL, so that it can be passed through to the catch
block for logging
or switching:
try {
do-something
} catch { |e|
echo (cat "The error is " $e)
}
This could also be used as the return value if there is no catch
block:
set myResult (try { error "my error" })
--> error: my error
Another idea I have is successive catch blocks, that would cascade one after the other if the one before it fails:
try {
do-something
} catch {
this-may-fail-also
} catch {
echo "Always passes"
}
Unlike JavaScript or Python, I don’t think the idea of having catch
blocks switching based on the error type would be suitable here. UCL is
dynamic in nature, and having this static type checking feels a little
wrong here. The catch
blocks will only act as isolated blocks of
execution, where an error would be caught and handled.
Finally, there’s finally
, which would run regardless of which try or
catch block was executed. I think, unlike the other two blocks, that the
return value of a finally
block will always be swallowed. I think this
will work as the finally block should mainly be used for clean-up, and
it’s the result of the try
or catch
blocks that are more important.
set res (try {
"try"
} catch {
"catch"
} finally {
"finally"
})
--> "try"
Anyway, this is the idea I have right now.
Update — I just realised that the idea of the last successful try block return an error, rather than letting it climb up the stack defeats the purpose of exceptions. So having something like the following:
try { error "this will fail" }
Should just unroll the stack and not return an error value. Although if there is a need to have an error value returned, then the following should work:
try { error "this will fail" } catch { |err| $err }
--> error: this will fail
Indexing In UCL
I’ve been thinking a little about how to support indexing in UCL, as in
getting elements from a list or keyed values from a map. There already
exists an index
builtin that does this, but I’m wondering if this can
be, or even should be, supported in the language itself.
I’ve reserved .
for this, and it’ll be relatively easy to make use
of it to get map fields. But I do have some concerns with supporting
list element dereferencing using square brackets. The big one being that
if I were to use square brackets the same way that many other languages
do, I suspect (although I haven’t confirmed) that it could lead to the
parser treating them as two separate list literals. This is because the
scanner ignores whitespace, and there’s no other syntactic indicators
to separate arguments to proc calls, like commas:
echo $x[4] --> echo $x [4]
echo [1 2 3][2] --> echo [1 2 3] [2]
So I’m not sure what to do here. I’d like to add support for .
for
map fields but it feels strange doing that just that and having nothing
for list elements.
I can think of three ways to address this.
Do Nothing — the first option is easy: don’t add any new syntax to
the language and just rely on the index
builtin. TCL does with
lindex, as does Lisp with nth, so I’ll be in good company
here.
Use Only The Dot — the second option is to add support for the dot
and not the square brackets. This is what the Go templating language
does for keys of maps or structs fields. They also have an index
builtin too, which will work with slice elements.
I’d probably do something similar but I may extend it to support index
elements. Getting the value of a field would be what you’d expect, but
to get the element of a list, the construct .(x)
can be used:
echo $x.hello \# returns the "hello" field
echo $x.(4) \# returns the forth element of a list
One benefit of this could be that the .(x)
construct would itself be a
pipeline, meaning that string and calculated values could be used as
well:
echo $x.("hello")
echo $x.($key)
echo $x.([1 2 3] | len)
echo $x.("hello" | toUpper)
I can probably get away with supporting this without changing the scanner or compromising the language design too much. It would be nice to add support for ditching the dot completely when using the parenthesis, a.la. BASIC, but I’d probably run into the same issues as with the square brackets if I did, so I think that’s out.
Use Parenthesis To Be Explicit — the last option is to use square brackets, and modify the grammar slightly to only allow the use of suffix expansion within parenthesis. That way, if you’d want to pass a list element as an argument, you have to use parenthesis:
echo ($x[4]) \# forth element of $x
echo $x[4] \# $x, along with a list containing "4"
This is what you’d see in more functional languages like Elm and I think Haskell. I’ll have see whether this could work with changes to the scanner and parser if I were to go with this option. I think it may be achievable, although I’m not sure how.
An alternative way might be to go the other way, and modify the grammar rules so that the square brackets would bind closer to the list, which would mean that separate arguments involving square brackets would need to be in parenthesis:
echo $x[4] \# forth element of $x
echo $x ([4]) \# $x, along with a list containing "4"
Or I could modify the scanner to recognise whitespace characters and use that as a guide to determine whether square brackets following a value. At least one space means the square bracket represent a element suffix, and zero mean two separate values.
So that’s where I am at the moment. I guess it all comes down to what works best for the language as whole. I can live with option one but it would be nice to have the syntax. I rather not go with option three as I’d like to keep the parser simple (I rather not add to all the new-line complexities I’ve have already).
Option two would probably be the least compromising to the design as a whole, even if the aesthetics are a bit strange. I can probably get use to them though, and I do like the idea of index elements being pipelines themselves. I may give option two a try, and see how it goes.
Anyway, more on this later.
UCL: Brief Integration Update and Modules
A brief update of where I am with UCL and integrating it into Dynamo-browse. I did managed to get it integrated, and it’s now serving as the interpreter of commands entered in during a session.
It works… okay. I decided to avoid all the complexities I mentioned in
the last post — all that about continuations, etc. — and simply kept the
commands returning tea.Msg
values. The original idea was to have the
commands return usable values if they were invoked in a non-interactive
manner. For example, the table
command invoked in an interactive
session will bring up the table picker for the user to select the table.
But when invoked as part of a call to another command, maybe it would
return the current table name as a string, or something.
But I decided to ignore all that and simply kept the commands as they are. Maybe I’ll add support for this in a few commands down the line? We’ll see. I guess it depends on whether it’s necessary.
Which brings me up to why this is only working “okay” at the moment.
Some commands return a tea.Msg
which ask for some input from the user.
The table
command is one; another is set-attr
, which prompts the
user to enter an attribute value. These are implemented as a message
which commands the UI to go into an “input mode”, and will invoke a
callback on the message when the input is entered.
This is not an issue for single commands, but it becomes one when you
start entering multiple commands that prompt for input, such as two
set-attr
calls:
set-attr this -S ; set-attr that -S
What happens is that two messages to show the prompt are sent, but only one of them is shown to the user, while the other is simply swallowed.
Fixing this would require some re-engineering, either with how the controllers returning these messages work, or the command handlers themselves. I can probably live with this limitation for now — other than this, the UCL integration is working well — but I may need to revisit this down the line.
Modules
As for UCL itself, I’ve started working on the builtins. I’m planning to have a small set of core builtins for the most common stuff, and the rest implemented in the form of “modules”. The idea is that the core will most likely be available all the time, but the modules can be turned on and off by the language embedder based on what they need or are comfortable having.
Each module is namespaces with a prefix, such as os
for operating
system operations, or fs
for file-system operations. I’ve chosen the
colon as the namespace separator, mainly so I can reserve the dot for
field dereferencing, but also because I think TCL uses the colon as a
namespace separator as well (I think I saw it in some sample code). The
first implementation of this was simply adding the colon to the list of
characters that make up the IDENT token. This broke the parser as the
colon is also use as the map key/value separator, and the parser
couldn’t resolve maps anymore. I had to extend the “indent” parse
rule to support multiple IDENT tokens separated by colons. The module
builtins are simply added to the environment with there fully-qualified
name, complete prefix and colon, and invoking them with one of these
idents will just “flatten” all these colon-separated tokens into a
single string. Not sophisticated, but it’ll work for now.
There aren’t many builtins for these modules at the moment: just a few for reading environment variables and getting files as list of strings. Dynamo-browse is already using this in a feature branch, and it’s allows me to finally add a long-standing feature I’ve been meaning to add for a while: automatically enabling read-only mode when accessing DynamoDB tables in production. With modules, this construct looks a little like the following:
if (eq (os:env "ENV") "prod") {
set-opt ro
}
It would’ve been possible to do this with the scripting language already used by Dynamo-browse. But this is the motivation of integrating UCL: it makes these sorts of constructs much easier to do, almost as one would do writing a shell-script over something in C.
UCL: Breaking And Continuation
I’ve started trying to integrate UCL into a second tool: Dynamo Browse. And so far it’s proving to be a little difficult. The problem is that this will be replacing a dumb string splitter, with command handlers that are currently returning a tea.Msg type that change the UI in some way.
UCL builtin handlers return a interface{}
result, or an error
result, so there’s no reason why this wouldn’t work. But tea.Msg
is
also an interface{}
types, so it will be difficult to tell a UI
message apart from a result that’s usable as data.
This is a Dynamo Browse problem, but it’s still a problem I’ll need to
solve. It might be that I’ll need to return tea.Cmd types — which
are functions returning tea.Msg
— and have the UCL caller detect these
and dispatch them when they’re returned. That’s a lot of function
closures, but it might be the only way around this (well, the
alternative is returning an interface type with a method that returns a
tea.Msg
, but that’ll mean a lot more types than I currently have).
Anyway, more on this in the future I’m sure.
Break, Continue, Return
As for language features, I realised that I never had anything to exit
early from a loop or proc. So I added break
, continue
, and return
commands. They’re pretty much what you’d expect, except that break
can optionally return a value, which will be used as the resulting value
of the foreach
loop that contains it:
echo (foreach [5 4 3 2 1] { |n|
echo $n
if (eq $n 3) {
break "abort"
}
})
--> 5
--> 4
--> 3
--> abort
These are implemented as error types under the hood. For example,
break
will return an errBreak
type, which will flow up the chain
until it is handled by the foreach
command (continue
is also an
errBreak
with a flag indicating that it’s a continue). Similarly,
return
will return an errReturn
type that is handled by the proc
object.
This fits quite naturally with how the scripts are run. All I’m doing
is walking the tree, calling each AST node as a separate function call
and expecting it to return a result or an error. If an error is return,
the function bails, effectively unrolling the stack until the error is
handled or it’s returned as part of the call to Eval()
. So leveraging
this stack unroll process already in place makes sense to me.
I’m not sure if this is considered idiomatic Go. I get the impression
that using error types to handle flow control outside of adverse
conditions is frowned upon. This reminds me of all the arguments against
using expressions for flow control in Java. Those arguments are good
ones: following executions between try
and catch
makes little sense
when the flow can be explained more clearly with an if
.
But I’m going to defend my use of errors here. Like most Go projects,
the code is already littered with all the if err != nil { return err }
to exit early when a non-nil error is returned. And since Go developers
preach the idea of errors simply being values, why not use errors here
to unroll the stack? It’s better than the alternatives: such as
detecting a sentinel result type or adding a third return value which
will just be yet another if bla { return res }
clause.
Continuations
Now, an idea is brewing for a feature I’m calling “continuations” that might be quite difficult to implement. I’d like to provide a way for a user builtin to take a snapshot of the call stack, and resume execution from that point at a later time.
The reason for this is that I’d like all the asynchronous operations to
be transparent to the UCL user. Consider a UCL script with a sleep
command:
echo "Wait here"
sleep 5
echo "Ok, ready"
sleep
could simply be a call to time.Sleep()
but say you’re running
this as part of an event loop, and you’d prefer to do something like
setup a timer instead of blocking the thread. You may want to hide this
from the UCL script author, so they don’t need to worry about
callbacks.
Ideally, this can be implemented by the builtin using a construct similar to the following:
func sleep(ctx context.Context, arg ucl.CallArgs) (any, error) {
var secs int
if err := arg.Bind(&secs); err != nil {
return err
}
// Save the execution stack
continuation := args.Continuation()
// Schedule the sleep callback
go func() {
<- time.After(secs * time.Seconds)
// Resume execution later, yielding `secs` as the return value
// of the `sleep` call. This will run the "ok, ready" echo call
continuation(ctx, secs)
})()
// Halt execution now
return nil, ucl.ErrHalt
}
The only trouble is, I’ve got no idea how I’m going to do this. As mentioned above, UCL executes the script by walking the parse tree with normal Go function calls. I don’t want to be in a position to create a snapshot of the Go call stack. That a little too low level for what I want to achieve here.
I suppose I could store the visited nodes in a list when the ErrHalt
is raised; or maybe replace the Go call stack with an in memory stack,
with AST node handlers being pushed and popped as the script runs. But
I’m not sure this will work either. It would require a significant
amount of reengineering, which I’m sure will be technically
interesting, but will take a fair bit of time. And how is this to work
if a continuation is made in a builtin that’s being called from another
builtin? What should happen if I were to run sleep
within a map
, for
example?
So it might be that I’ll have to use something else here. I could
potentially do something using Goroutines: the script is executed on
Goroutine and args.Continuation()
does something like pauses it on a
channel. How that would work with a builtin handler requesting the
continuation not being paused themselves I’m not so sure. Maybe the
handlers could be dispatched on a separate Goroutine as well?
A simpler approach might be to just offload this to the UCL user, and
have them run Eval
on a separate Goroutine and simply sleeping the
thread. Callbacks that need input from outside could simply be sent
using channels passed via the context.Context
. At least that’ll lean
into Go’s first party support for synchronisation, which is arguably a
good thing.
UCL: The Simplifications Paid Off
The UCL simplifications have been implemented, and they seem to be largely successful.
Ripped out all the streaming types, and changed pipes to simply pass the result of the left command as first argument of the right.
"Hello" | echo ", world"
--> "Hello, world"
This has dramatically improved the use of pipes. Previously, pipes could only be used to connect streams. But now, with pretty much anything flowing through a pipe, that list of commands has extended to pretty much every builtins and user-defined procs. Furthermore, a command no longer needs to know that it’s being used in a pipeline: whatever flows through the pipe is passed transparently via the first argument to the function call. This has made pipes more useful, and usable in more situations.
Macros can still know whether there exist a pipe argument, which can
make for some interesting constructs. Consider this variant of the
foreach
macro, which can “hang off” the end of a pipe:
["1" "2" "3"] | foreach { |x| echo $x }
--> 1
--> 2
--> 3
Not sure if this variant is useful, but I think it could be. It seems
like a natural way to iterate items passed through the pipe. I’m
wondering if this could extend to the if
macro as well, but that
variant might not be as natural to read.
Another simplification was changing the map
builtin to accept
anonymous blocks, as well as an “invokable” commands by name.
Naturally, this also works with pipes too:
[a b c] | map { |x| toUpper $x }
--> [A B C]
[a b c] | map toUpper
--> [A B C]
As for other language features, I finally got around to adding support for integer literals. They look pretty much how you expect:
set n 123
echo $n
--> 123
One side effect of this is that an identifier can no longer start with a dash followed by a digit, as that would be parsed as the start of a negative integer. This probably isn’t a huge deal, but it could affect command switches, which are essentially just identifiers that start with a dash.
Most of the other work done was behind the scenes trying to make UCL easier to embed. I added the notion of “listable” and “hashable” proxies objects, which allow the UCL user to treat a Go slice or a Go struct as a list or hash respectively, without the embedder doing anything other than return them from a function (I’ve yet to add this support to maps just yet).
A lot of the native API is still a huge mess, and I really need to tidy it up before I’d be comfortable opening the source. Given that the language is pretty featureful now to be useful, I’ll probably start working on this next. Plus adding builtins. Really need to start adding useful builtins.
Anyway, more to come on this topic I’m sure.
Oh, one last thing: I’ve put together an online playground where you can try the language out in the browser. It’s basically a WASM build of the language running in a JavaScript terminal emulator. It was a little bit of a rush job and there’s no reason for building this other than it being a fun little thing to do.
You can try it out here, if you’re curious.
Simplifying UCL
I’ve been using UCL for several days now in that work tool I mentioned, and I’m wondering if the technical challenge that comes of making a featureful language is crowding out what I set out to do: making a useful command language that is easy to embed.
So I’m thinking of making some simplifications.
The first is to expand the possible use of pipes. To date, the only thing that can travel through pipes are streams. But many of the commands I’ve been adding simply return slices. This is probably because there’s currently no “stream” type available to the embedder, but even if there was, I’m wondering if it make sense to allow the embedder to pass slices, and other types, through pipes as well.
So, I think I’m going to take a page out of Go’s template book and
simply have pipes act as syntactic sugar over sequential calls. The goal
is to make the construct a | b
essentially be the same as b (a)
,
where the first argument of b
will be the result of a
.
As for streams, I’m thinking of removing them as a dedicated object type. Embedders could certainly make analogous types if they need to, and the language should support that, but the language will no longer offer first class support for them out of the box.
The second is to remove any sense of “purity” of the builtins. You may
recall the indecision I had regarding using anonymous procs with the
map
command:
I’m not sure how I can improve this. I don’t really want to add automatic dereferencing of identities: they’re very useful as unquoted string arguments. I suppose I could add another construct that would support dereferencing, maybe by enclosing the identifier in parenthesis.
I think this is the wrong way to think of this. Again, I’m not here to
design a pure implementation of the language. The language is meant to
be easy to use, first and foremost, in an interactive shell, and if that
means sacrificing purity for a map
command that supports blocks,
anonymous procs, and automatic dereferencing of commands just to make it
easier for the user, then I think that’s a trade work taking.
Anyway, that’s the current thinking as of now.
UCL: Procs and Higher-Order Functions
More on UCL yesterday evening. Biggest change is the introduction of user functions, called “procs” (same name used in TCL):
proc greet {
echo "Hello, world"
}
greet
--> Hello, world
Naturally, like most languages, these can accept arguments, which use
the same block variable binding as the foreach
loop:
proc greet { |what|
echo "Hello, " $what
}
greet "moon"
--> Hello, moon
The name is also optional, and if omitted, will actually make the function anonymous. This allows functions to be set as variable values, and also be returned as results from other functions.
proc makeGreeter { |greeting|
proc { |what|
echo $greeting ", " $what
}
}
set helloGreater (makeGreeter "Hello")
call $helloGreater "world"
--> Hello, world
set goodbye (makeGreeter "Goodbye cruel")
call $goodbye "world"
--> Goodbye cruel, world
I’ve added procs as a separate object type. At first glance, this may seem a little unnecessary. After all, aren’t blocks already a specific object type?
Well, yes, that’s true, but there are some differences between a proc and a regular block. The big one being that the proc will have a defined scope. Blocks adapt to the scope to which they’re invoked whereas a proc will close over and include the scope to which it was defined, a lot like closures in other languages.
It’s not a perfect implementation at this stage, since the set
command only sets variables within the immediate scope. This means that
modifying closed over variables is currently not supported:
\# This currently won't work
proc makeSetter {
set bla "Hello, "
proc appendToBla { |x|
set bla (cat $bla $x)
echo $bla
}
}
set er (makeSetter)
call $er "world"
\# should be "Hello, world"
Higher-Order Functions
The next bit of work is finding out how best to invoke these procs in higher-order functions. There are some challenges here that deal with the language grammar.
Invoking a proc by name is fine, but since the grammar required the
first token to be a command name, there was no way to invoke a proc
stored in a variable. I quickly added a new call
command — which takes
the proc as the first argument — to work around it, but after a while,
this got a little unwieldy to use (you can see it in the code sample
above).
So I decided to modify the grammar to allow any arbitrary value to be the first token. If it’s a variable that is bound to something “invokable” (i.e. a proc), and there exist at-least one other argument, it will be invoked. So the above can be written as follows:
set helloGreater (makeGreeter "Hello")
$helloGreater "world"
--> Hello, world
At-least one argument is required, otherwise the value will simply be returned. This is so that the value of variables and literal can be returned as is, but that does mean lambdas will simply be dereferenced:
"just, this"
--> just, this
set foo "bar"
$foo
--> bar
set bam (proc { echo "BAM!" })
$bam
--> (proc)
To get around this, I’ve added the notion of the “empty sub”, which
is just the construct ()
. It evaluates to nil, and since a function
ignores any extra arguments not bound to variables, it allows for
calling a lambda that takes no arguments:
set bam (proc { echo "BAM!" })
$bam ()
--> BAM!
It does allow for other niceties, such as using a falsey value:
if () { echo "True" } else { echo "False" }
--> False
With lambdas now in place, I’m hoping to work on some higher order
functions. I’ve started working on map
which accepts both a list or a
stream. It’s a buggy mess at the moment, but some basic constructs
currently work:
map ["a" "b" "c"] (proc { |x| toUpper $x })
--> stream ["A" "B" "C"]
(Oh, by the way, when setting a variable to a stream using set
, it
will now collect the items as a list. Or at least that’s the idea.
It’s currently not working at the moment.)
A more refined approach would be to treat commands as lambdas. The grammar supports this, but the evaluator doesn’t. For example, you cannot write the following:
\# won't work
map ["a" "b" "c"] toUpper
This is because makeUpper
will be treated as a string, and not a
reference to an invokable command. It will work for variables. You can
do this:
set makeUpper (proc { |x| toUpper $x })
map ["a" "b" "c"] $makeUpper
I’m not sure how I can improve this. I don’t really want to add automatic dereferencing of identities: they’re very useful as unquoted string arguments. I suppose I could add another construct that would support dereferencing, maybe by enclosing the identifier in parenthesis:
\# might work?
map ["a" "b" "c"] (toUpper)
Anyway, more on this in the future I’m sure.
UCL: First Embed, and Optional Arguments
Came up with a name: Universal Control Language: UCL. See, you have TCL; but what if instead of being used for tools, it can be more universal? Sounds so much more… universal, am I right? 😀
Yeah, okay. It’s not a great name. But it’ll do for now.
Anyway, I’ve started integrating this language with the admin tool I’m using at work. This tool I use is the impetus for this whole endeavour. Up until now, this tool was just a standard CLI command usable from the shell. But it’s not uncommon for me to have to invoke the tool multiple times in quick succession, and each time I invoke it, it needs to connect to backend systems, which can take a few seconds. Hence the reason why I’m converting it into a REPL.
Anyway, I added UCL to the tool, along with a readline library, and wow, did it feel good to use. So much better than the simple quote-aware string splitter I’d would’ve used. And just after I added it, I got a flurry of requests from my boss to gather some information, and although the language couldn’t quite handle the task due to missing or unfinished features, I can definitely see the potential there.
I’m trying my best to only use what will eventually be the public API to add the tool-specific bindings. The biggest issue is that these “user bindings” (i.e. the non-builtins) desperately need support for producing and consuming streams. They’re currently producing Go slices, which are being passed around as opaque “proxy objects”, but these can’t be piped into other commands to, say, filter or map. Some other major limitations:
One last thing that would be nice is the ability to define optional arguments. I actually started work on that last night, seeing that it’s relatively easy to build. I’m opting for a style that looks like the switches you’d find on the command line, with option names starting with dashes:
join "a" "b" -separator "," -reverse
--> b, a
Each option can have zero or more arguments, and boolean options can be represented as just having the switch. This does mean that they’d have to come after the positional arguments, but I think I can live with that. Oh, and no syntactic sugar for single-character options: each option must be separated by whitespace (the grammar actually treats them as identifiers). In fact, I’d like to discourage the use of single-character option names for these: I prefer the clarity that comes from having the name written out in full (that said, I wouldn’t rule out support for aliases). This eliminates the need for double dashes, to distinguish long option names from a cluster of single-character options, so only the single dash will be used.
I’ll talk more about how the Go bindings look later, after I’ve used them a little more and they’re a little more refined.
Tool Command Language: Lists, Hashs, and Loops
A bit more on TCL (yes, yes, I’ve gotta change the name) last night. Added both lists and hashes to the language. These can be created using a literal syntax, which looks pretty much looks how I described it a few days ago:
set list ["a" "b" "c"]
set hash ["a":"1" "b":"2" "c":"3"]
I had a bit of trouble working out the grammar for this, I first went with something that looked a little like the following, where the key of an element is optional but the value is mandatory:
list_or_hash --> "[" "]" \# empty list
| "[" ":" "]" \# empty hash
| "[" elems "]" \# elements
elems --> ((arg ":")? arg)* \# elements of a list or hash
arg --> <anything that can be a command argument>
But I think this confused the parser a little, where it was greedily
consuming the key arg and expecting the :
to be present to consume the
value.
So I flipped it around, and now the “value” is the optional part:
elems --> (arg (":" arg)?)*
So far this seems to work. I renamed the two fields “left” and “right”, instead of key and value. Now a list element will use the “left” part, and a hash element will use “left” for the key and “right” for the value.
You can probably guess that the list and hash are sharing the same AST types. This technically means that hybrid lists are supported, at least in the grammar. But I’m making sure that the evaluator throws an error when a hybrid is detected. I prefer to be strict here, as I don’t want to think about how best to support it. Better to just say either a “pure” list, or a “pure” hybrid.
Well, now that we have collections, we need some way to iterate over
them. For that, I’ve added a foreach
loop, which looks a bit like the
following:
\# Over lists
foreach ["a" "b" "c"] { |elem|
echo $elem
}
\# Over hashes
foreach ["a":"1" "b":"2"] { |key val|
echo $key " = " $val
}
What I like about this is that, much like the if
statement, it’s
implemented as a macro. It takes a value to iterate over, and a block
with bindable variables: one for list elements, or two for hash keys and
values. This does mean that, unlike most other languages, the loop
variable appears within the block, rather than to the left of the
element, but after getting use to this form of block from my Ruby days,
I can get use to it.
One fun thing about hashes is that they’re implemented using Go’s
map
type. This means that the iteration order is random, by design.
This does make testing a little difficult (I’ve only got one at the
moment, which features a hash of length one) but I rarely depend on the
order of hash keys so I’m happy to keep it as is.
This loop is only the barest of bones at the moment. It doesn’t support
flow control like break
or continue
, and it also needs to support
streams (I’m considering a version with just the block that will accept
the stream from a pipe). But I think it’s a reasonably good start.
I also spend some time today integrating this language in the tool I was building it for. I won’t talk about it here, but already it’s showing quite a bit of promise. I think, once the features are fully baked, that this would be a nice command language to keep in my tool-chest. But more of that in a later post.
Tool Command Language: Macros And Blocks
More work on the tool command language (of which I need to come up with a name: I can’t use the abbreviation TCL), this time working on getting multi-line statement blocks working. As in:
echo "Here"
echo "There"
I got a little wrapped up about how I can configure the parser to
recognise new-lines as statement separators. I tried this in the past
with a hand rolled lexer and ended up peppering NL
tokens all around
the grammar. I was fearing that I needed to do something like this here.
After a bit of experimentation, I think I’ve come up with a way to
recognise new-lines as statement separators without making the grammar
too messy. The unit tests verifying this so far seem to work.
// Excerpt of the grammar showing all the 'NL' token matches.
// These match a new-line, plus any whitespace afterwards.
type astStatements struct {
First *astPipeline `parser:"@@"`
Rest []*astPipeline `parser:"( NL+ @@ )*"`
}
type astBlock struct {
Statements []*astStatements `parser:"LC NL? @@ NL? RC"`
}
type astScript struct {
Statements *astStatements `parser:"NL* @@ NL*"`
}
I’m still using a stateful lexer as it may come in handy when it comes to string interpolation. Not sure if I’ll add this, but I’d like the option.
Another big addition today was macros. These are much like commands, but instead of arguments being evaluated before being passed through to the command, they’re deferred and the command can explicitly request their evaluation whenever. I think Lisp has something similar: this is not that novel.
This was used to implement the if
command, which is now working:
set x "true"
if $x {
echo "Is true"
} else {
echo "Is not true"
}
Of course, there are actually no operators yet, so it doesn’t really do much at the moment.
This spurred the need for blocks. which is a third large addition made today. They’re just a group of statements that are wrapped in an object type. They’re “invokable” in that the statements can be executed and produce a result, but they’re also a value that can be passed around. It jells nicely with the macro approach.
Must say that I like the idea of using macros for things like if
over
baking it into the language. It can only add to the “embed-ability” of
this, which is what I’m looking for.
Finally, I did see something interesting in the tests. I was trying the following test:
echo "Hello"
echo "World"
And I was expecting a Hello
and World
to be returned over two lines.
But only World
was being returning. Of course! Since echo
is
actually producing a stream and not printing anything to stdout, it
would only return World
.
I decided to change this. If I want to use echo
to display a message,
then the above script should display both Hello
and World
in some
manner. The downside is that I don’t think I’ll be able to support
constructs like this, where echo provides a source for a pipeline:
\# This can't work anymore
echo "Hello" | toUpper
I mean, I could probably detect whether echo
is connected to a pipe
(the parser can give that information). But what about other commands
that output something? Would they need to be treated similarly?
I think it’s probably best to leave this out for now, and have a new construct for providing literals like this to a pipe. Heck, maybe just having the string itself would be enough:
"hello" | toUpper
Anyway, that’s all for today.
Tool Command Language
I have this idea for a tool command language. Something similar to TCL, in that it’s chiefly designed to be used as an embedded scripting language and chiefly in an interactive context.
It’s been an idea I’ve been having in my mind for a while, but I’ve got the perfect use case for it. I’ve got a tool at work I use to do occasional admin tasks. At the moment it’s implemented as a CLI tool, and it works. But the biggest downside is that it needs to form connections to the cluster to call internal service methods, and it always take a few seconds to do so. I’d like to be able to use it to automate certain actions, but this delay would make doing so a real hassle.
Some other properties that I’m thinking off:
Some of the trade-offs that come of it:
Some Notes On The Design
The basic concept is the statement. A statement consists of a command, and zero or more arguments. If you’ve used a shell before, then you can imagine how this’ll look:
firstarg "hello, world"
--> hello, world
Each statement produces a result. Here, the theoretical firstarg
will
return the first argument it receives, which will be the string "hello, world"
Statements are separated by new-lines or semicolons. In such a sequence, the return value of the last argument is returned:
firstarg "hello" ; firstarg "world"
--> world
I’m hoping to have a similar approach to how Go works, in that semicolons will be needed if multiple statements share a line, but will otherwise be unnecessary. I’m using the Participal parser library for this, and I’ll need to know how I can configure the scanner to do this (or even if using the scanner is the right way to go).
The return value of statements can be used as the arguments of other statements by wrapping them in parenthesis:
echo (firstarg "hello") " world"
--> hello world
This is taken directly from TCL, except that TCL uses the square brackets. I’m reserving the square brackets for data structures, but the parenthesis are free. It also gives it a bit of a Lisp feel.
Pipelines
Another way for commands to consume the output of other commands is to
build pipelines. This is done using the pipe |
character:
echo "hello" | toUpper
--> HELLO
Pipeline sources, that is the command on the left-most side, can be either commands that produce a single result, or a command that produces a “stream”. Both are objects, and there’s nothing inherently special about a stream, other than there some handling when used as a pipeline. Streams are also designed to be consumed once.
For example, one can consider a command which can read a file and produce a stream of the contents:
cat "taleOfTwoCities.txt"
--> It was the best of times,
--> it was the worst of times,
--> …
Not every command is “pipe savvy”. For example, piping the result of a
pipeline to echo
will discard it:
echo "hello" | toUpper | echo "no me"
--> no me
Of course, this may differ based on how the builtins are implemented.
Variables
Variables are treated much like TCL and shell, in that referencing them is done using the dollar sign:
set name "josh"
--> "Josh"
echo "My name is " $name
--> "My name is Josh"
Not sure how streams will be handled with variables but I’m wondering if they should be condensed down to a list. I don’t like the idea of assigning a stream to a variable, as streams are only consumed once, and I feel like some confusion will come of it if I were to allow this.
Maybe I can take the Perl approach and use a different variable
“context”, where you have a variable with a @
prefix which will
reference a stream.
set file (cat "eg.text")
echo @file
\# Echo will consume file as a stream
echo $file
\# Echo will consume file as a list
The difference is subtle but may be useful. I’ll look out for instances where this would be used.
Attempting to reference an unset variable will result in an error. This may also change.
Other Ideas
That’s pretty much what I have at the moment. I do have some other ideas, which I’ll document below.
Structured Data Support: Think lists and hashes. This language is to be used with structured data, so I think it’s important that the language supports this natively. This is unlike TCL which principally works with strings and the notion of lists feels a bit tacked on to some extent.
Both lists and hashes are created using square brackets:
\# Lists. Not sure if they'll have commas or not
set l [1 2 3 $four (echo "5")]
\# Maps
set m [a:1 "b":2 "see":(echo "3") (echo "dee"):$four]
Blocks: Yep, containers for a groups of statements. This will be used for control flow, as well as for definition of functions:
set x 4
if (eq $x 4) {
echo "X == 4"
} else {
echo "X != 4"
}
foreach [1 2 3] { |x|
echo $x
}
Here the blocks are just another object type, like strings and stream,
and both if
and foreach
are regular commands which will accept a
block as an argument. In fact, it would be theoretically possible to
write an if
statement this way (not sure if I’ll allow setting
variables to blocks):
set thenPart {
echo "X == 4"
}
if (eq $x 4) $thenPart
The block execution will exist in a context, which will control whether
a new stack frame will be used. Here the if
statement will simply use
the existing frame, but a block used in a new function can push a new
frame, with a new set of variables:
proc myMethod { |x|
echo $x
}
myMethod "Hello"
--> "Hello
Also note the use of |x|
at the start of the block. This is used to
declare bindable variables, such as function arguments or for loop
variables. This will be defined as part of the grammar, and be a
property of the block.
Anyway, that’s the current idea.