Parser Stuff: Strings
A while back I was working on a programming language idea and while I haven't made any progress on it in ages, I really liked the string design that I came up with. I don't know that any of the ideas are original, but I haven't seen anything exactly like it so I figure I'd throw the idea out into the ether in case anyone else happens to do something similar in their own personal language that they definitely shouldn't be making ๐
(I'll note upfront: this design uses dollar signs ($
) and backticks(`
). There are many languages that do so (like Javascript!) but these keys are not universally on all keyboards internationally so for those locales it may be more difficult to type these out...my language design was pretty much just for me with my standard US keyboard so I didn't take this into account)
Basic Strings
There's nothing fancy about these, they're just like most other languages' basic strings:
"This is a string"
"This is a string with a newline at the end: \n"
"Quotes? \"Escape them\""
"Oops forgot to end this one, it's a compiler error
Just a pair of double quotes with everything between being non-quotes (or escaped quotes), contained within a single line.
Raw Strings
Raw strings in my language design are kind of a blend between C++11-style raw strings and C#11 raw strings (the elevens!), using a different delimiting character variation than I've seen elsewhere. In this case, the simplest one starts and ends with a pair of backticks:
``This is a single raw string
it is multiple lines long``
It can contain any character sequence except the delimiting sequence (again, at simplest a pair of backticks: ``)
But what if your string needs to contain a consecutive pair of backticks? This is where the C++11 raw string inspiration comes in: you can put any string of characters between the ticks (excluding ticks, obviously, or newlines), and then the start and end have to match.
`uniqueString`This is a single string
it contains backticks without terminating: `` ... see?
This is the last line and ends here:`uniqueString`
This one starts with `uniqueString`
, and so the only thing that will terminate it is that same sequence: `uniqueString`
(with the tick marks around it).
To add to this, cribbing from C#11 it will:
- Trim the very first newline if there is one
- Also trim the last newline if there is one
- Unindent every line of it based on the indentation of the final quote sequence:
myString = ``
This is actually the first line of the string, the newline was ignored
{
indented further
}
``; // Note that this is indended 2 spaces
which turns into the string (note the lack of being completely indented:
This is actually the first line, the newline was ignored
{
indented further
}
This makes it easier to generate code (or text files or whatever) that are properly indented, without having to make the indenting of the string in your code all weird.
Interpolated Strings
I'm additionally adding string interpolated string support (which is a weird term), using a mix of C# and Javascript's setup:
$"This string has a ${value} in it"
If a string starts with $
, it's treated as an interpolated string. A string-convertible expression can be inserted in-place in the string within ${}
.
But what if you need to have the character sequence ${
in your string? Add more dollar signs to the start, and you need that many dollar signs before a {
to enter the Interpolation Zone:
$$"This string has a $${value} in it, ${but this isn't one}"
Raw strings can also be used as interpolated strings (making for some nice codegen), same rules apply:
$$``
Interpolated string with
multiple lines and a $${value} in it.
${this is not a value because only one $}
``
If value == 5
this would turn into the following string (upon formatting):
Interpolated string with
multiple lines and a 5 in it.
${this is not a value because only one $}
I Just Think They're Neat
Anyway, I think this is a really nice combination of properties that make it easy to format strings nicely without being overly-complicated to actually use (unlike C++'s raw strings, which I have to look up literally every time I need to use one). Need a hardcoded regex or path with backslashes? In most cases, just use a raw string with ``
on either end:
``C:\Path\With\Single\Backslashes``
or
``^[\r\n \t]*Hi[\r\n \t]*$``
Hope this was at least mildly interesting to someone!