On this page ... Templating Without a Template. | Run-length encoding | Some notes on git-config | Quickly installing requirements for perl scripts | A usable shell under AnyEvent | Welcome

Templating Without a Template.


You can generate html from perl with the things CGI.pm exports, but chances are that you might hire a designer and they'll hate you for it.

There are oodles of template engines on cpan, my personal favourite is HTML::Mason... Mason is the more the context of this post that the purpose though.

At some point in your life, you'll find yourself in mason land with a list of things, and the need to stick that in an array for your template:

sub get_records {
    my @records = ORM->search_for_related_things();
    $_->{link_tag} = sprintf '<a href="view_record.html?id=%s">%s</a>",
       $_->id, $_->title 
        for @records;
}

and then you can readily use that html in your template:

<nav><ul>
% for my $record ( $controller->get_records ) { 
    <li><% $record->{link_tag} %>
% }
</ul></nav>

Except you've just messed up escaping on both sets of interpolations above. The database shouldn't have entity encoded strings in it, nor should it have HTML tags of any sort, so these should be escaped.

There's also an injection for any id that contains a single quote character, users can break the url and inject javascript or even whole tags as as they like.

What if we don't put HTML in the array?

HTML::Element is part of the HTML::Tree distribution, and is used, surprisingly, for modelling elements in an HTML Document. The handy part is that it knows how to escape magic values in attributes and the like:

me@compy386:~ perl -MHTML::Element  -E '
    my $a=HTML::Element->new(a=> href=> "view_record.html?id=%s");
    $a->push_content( "This & that");
    say $a->as_HTML
    '

<a href="view_record.html?id=14">This &amp; that</a>

We just need to make url construction in there safe too...

me@compy386:~ perl -MURI -MHTML::Element -E '
    (my $b= URI->new("view_record.html"))->query_form(id=>14);
    my $a=HTML::Element->new(a=> href=> $b );
    $a->push_content( "This & that");
    say $a->as_HTML
    '

<a href="view_record.html?id=14">This &amp; that</a>

Seems ok, more objects representing our markup and less string concat'ing means we're less likely to get escaping wrong, let's try it on:

me@compy386:~ perl -MURI -MHTML::Element -E '
    (my $b= URI->new("view_record.html"))->query_form(id=>q/5" onload="alert(1)" "/);
    my $a=HTML::Element->new(a=> href=> $b );
    $a->push_content( "This & that");
    say $a->as_HTML
    '

We can see that URI did the helpful thing and escaped everything so the href stayed in the html tag

<a href="view_record.html?id=5%22+onload%3D%22alert(1)%22+%22">This &amp; that</a>

That's kinda annoying though

In order to fairly decide how annoying creating a URI object and passing it to HTML::Element object is, it seems only fair to do it the right in the other version...

You can't really do it right because you don't know the context that the {link_tag} will be used in, so we can just assume that the call site will correctly escape it, throwing out half of the bath water, and most of the baby:

sub get_records {
    my @records = ORM->search_for_related_things();
    $_->{link_tag} = sprintf '<a href="view_record.html?id=%s">%s</a>",
       encode_entities(url_escape($_->id)), encode_html_entities($_->title)
        for @records;
}

That's also the simplest case

Even though a link element is fairly straight forward, we can still see that it turns into a whole bundle of code if you do it by hand. If you're building anything more complicated than a link to some other place and a heading, you'll quickly find that you're trying to escape params from the request, data from your database, external APIs and from all kinds of trust levels. You'll be doing it in all sorts of different contexts in your document. Do you remember the escaping rules for javascript strings in a JSON response? How are they different from the rules in an inline <script> tag? How do css expressions work again? Are they different in an html attribute? Life is tough.

Having an object model is handy

Having an object model that represents your data allows you to store much more information than simply passing strings about, and that will in turn give you a better idea of how to correctly use your data and how to avoid security issues caused by mixing contexts and allowing user input to cross trust boundaries.

And best of all, you don't have to do all the escaping by hand.

Run-length encoding


I did a hacker-rank thing about run-length encoding, after hitting submit I realised there was more golfing to be done:

me@compy-386:~ echo "aaaabcdeeee" | perl -plE '
s/^(.)(\1*)/$o.= $1.($2&&length" $2");""/e while $_;
$_=$o
'
a4bcde4

It's nothing too magical:

  • -print each line after -Evaling the expression for each line of the file, with automatic handling of -line endings
  • for each iteration, we match the first character in $_, and the more of them (\1*)
    • with /e s/// will evaluate the replacement as an expression instead of treating it as a plain old string replacement
    • the expression appends the first match $1 (the first letter) and the length of the second match in $2 (the rest of the run) to $o
    • an empty $2 means the length isn't added because the challenge dictated that a single character is left alone ('a' instead of 'a1')
    • the "" is htere so the matched text is replaced with nothing, moving us closer to $_ being empty
  • the while loop continues until $_ is empty
  • once $_ is empty and all the text is processed, $o is assinged to $_ so it's printed.

Todo:

  • remove $o, by using print, or with fancy use of /g.
  • remove the while, I'm sure it can be done.
  • remove "" from the replace.

Standard perl-golf disclaimer

Please don't do this kind of thing in a production code base

Some notes on git-config


gitconfig-post-icon

I'm sure everyone has a .gitconfig with some handy aliases like:

  alias.ff=git pull --ff-only
  alias.rb=git pull --rebase

If you open up your .gitconfig it'll look something like:

[alias]
    ; too lazy to type these all in full
    root = !pwd
    ff   = pull --ff-only
    rb   = pull --rebase
    stat = status

It looks like an ini file, really.

The cool thing about .ini is that everyone has their own freak-show extensions to the simple ini format, which is really not much more complex than what's above, but has been extended in different directions with each implementation.

git-config obviously has its own rules about what's allowable, and how things are stored.

You're not allowed underscores:

me@compy386:~ $ git config -f ./example --add foo.bar_baz 1
error: invalid key: foo.bar_baz

me@compy386:~ $ git config -f ./example --add foo_bar.bar 1
error: invalid key: foo_bar.bar

So, unless your language lets you have - in method names, or you like snakeCase you're going to have to mangle the names after reading your config.

Your settings need to be in a section

me@compy386:~ $ git config -f example --add bar 1
error: key does not contain a section: bar

Doing this makes the config file much easier to deal with, and leaves you without the quagmire of nonsense dealing with "keys with no section go into the _ section"

Sections can have sub-sections

If you want to have configs for multiple named things of the same type:

me@compy386:~ $ git config -f ./example --add foo.thething.bar-baz 1

me@compy386:~ $ cat ./example 
[foo "thething"]
    bar-baz = 1

me@compy386:~ $ git config -list -f ./example 
foo.thething.bar-baz=1

Yep, you can have 2 levels of keys, and you end up with [first "second"] in your config. Neat!

This is used for branches and remotes among other things:

.git/config
[branch "master"]
    remote = origin
    merge = refs/heads/master

sections can have the same name as sub-sections

me@compy386:~ $ git config -f example --add foo.bar.baz 1

me@compy386:~ $ cat example 
[foo]
    bar = 1
[foo "bar"]
    baz = 1

me@compy386:~ $ git config -l -f example 
foo.bar=1
foo.bar.baz=1

If you're parsing this directly into a data structure you can end up with some fairly upsetting situations, like foo.bar becoming a hashmap when you don't expect it.

git-config - you might as well use it.

If you're building a tool that depends on git for a large portion of its job, you might as well use git-config too. It's a format that your users are likely already familiar with, and fits neatly into the ecosystem.

Quickly installing requirements for perl scripts


modules-post-icon

Sometimes you stumble across some perl that you want to run, but it's not neatly packaged as a cpan dist and doesn't have a nice list of modules to get it going.


Often it's an email or IRC conversation witha gist or pastebin link to some perl they're working on.

  • You can run the script a couple of times, installing missing depends as you go
  • You can ask the author
  • You can rely on your amazing tooling to get you the right versions of the modules.

Go get cpan-minus

It's the one after cpanplus, except it's lighter because it has fewer oddball features.

If you don't have cpanm you can bootstrap it from http://cpanmin.us with:

% curl -L https://cpanmin.us | perl - App::cpanminus

cpanm knows how to install cpanm.

me@compy386:~ cpanm $( 
    perl -nle '
    /use ([:\w]+) ([0-9.]+)/ and $d{$1} = $2  
    }{
    printf "%s@%s ", $_, $d{$_} for keys %d' -- shell-only  
)

On My machine it just prints this:

Object::Tiny::RW is up to date. (1.07)
AnyEvent::ReadLine::Gnu is up to date. (1.0)
AnyEvent is up to date. (7.11)

The one-liner produces this:

 AnyEvent@7.11 Object::Tiny::RW@1.07 AnyEvent::ReadLine::Gnu@1.0 

We just match use Letters::And::Colons space numbers and stash them.

If you replace cpanm with echo, you'll see that we print out the module names and versions in the form Object::Tiny::RW@1.07, the format cpanm likes.

A usable shell under AnyEvent


shells-post-icon

Readline is something everyone takes for granted, and it makes interacting with a program much more enjoyable than simply typing at a program that reads directly from STDIN. It gives you options for tab complete, line editing and user configurability that you would have to implement yourself otherwise.

If you've got a complex AnyEvent program running, with loads of stuff going on, an interactive shell can make your program more enjoyable for both you and your users.

Since shells have already taught us how background tasks work, we can provide a comfortable interface for users familiar with job control in bash.

You can get all this lovely functionality from AnyEvent::ReadLine::Gnu, just throw it in your event loop and you're good to go.

A simple shell with AnyEvent::ReadLine::Gnu

As a start you can drop this stuff in a file (that I've called shell-only):

package Computer::Program;
use warnings; use strict;

# Set up some rw accessors.
use Object::Tiny::RW 1.07 (
    shell       => # Our readline object 
    cv          => # Our condvar, which serves as our mainloop
    delay_timer => # The condvar for our background task. 
);

sub run {
    my $self = shift;
    use AnyEvent 7.11;
    $self->cv( AnyEvent->condvar );

    # Users can type at this thing.
    use AnyEvent::ReadLine::Gnu 1.0;
    $self->shell( AnyEvent::ReadLine::Gnu->new(
        prompt => 'How can I help you? > ',
        on_line => sub {
            my ($line) = @_;
            $self->handle_command( $line )
        },
    ));

    $self->cv->recv; # run the loop. Real programs will C< EV::loop > or similar.
}

# these are the actual commands, in a "dispatch table":
my %commands; %commands = (
    'help' => sub {
        my ($self) = @_;
        $self->shell->print(        # this always needs \n
            sprintf "Try one of '%s'\n", join ', ', sort keys %commands
        );
    },
    'wait' => sub {
        my ($self, $delay) = @_;

        # If one is scheduled, let it run:
        if ($self->delay_timer()) {
            $self->shell->print( "You're already waiting. Please wait harder..\n");
            return;
        }

        # Store the timer guard away
        # (if you don't store the guard, it goes out of scope, and cancels the timer)
        $self->delay_timer(
            AnyEvent->timer(after => $delay, cb => sub {
                my $s=$delay == 1 ? '' : 's' ; # English.
                $self->shell->print( "It has been $delay second$s, sir.\n");
                $self->delay_timer(undef);
            })
        );
    },
    'exit' => sub { 
        my $self = shift;
        $self->cv->send 
    },
);

# parse text from the user with the magical C< split '' >
sub handle_command {
    my $self = shift;
    my ($requested, @args) = split ' ', shift;
    # check defaults, use help if $command is bogus
    if (defined $requested and exists $commands{ $requested }) {
        $commands{ $requested }->( $self, @args )
    }
    else {
        $commands{ help }->( $self )
    }
}

# kick off the program, but only when run as perl $filename
__PACKAGE__->new->run() unless caller;

A Quick note on dependency management

Since this is just a quick program for a blog post, and not a real project we'll just install the modules we need manually instead of packaging it properly:

 cpanm AnyEvent@7.11 Object::Tiny::RW@1.07 AnyEvent::ReadLine::Gnu@1.0

If you don't have cpanm you can bootstrap it from cpanmin.us with:

% curl -L https://cpanmin.us | perl - App::cpanminus

There isn't anything magical about these versions, they are just the latest ones at the time of writing.

Let's run this puppy:

It doesn't really do anything interesting at all, but let's fire it up:

me@compy386:~ perl shell-only 

It will politely prompt us for instructions:

How can I help you? > help
Try one of 'exit, help, wait'
How can I help you? > wait 1

After about a second passes:

It has been 1 second, sir.
How can I help you? > wait 10
How can I help you? > wait 1
You're already waiting. Please wait harder..

and then, roughly 9 seconds later:

It has been 10 seconds, sir.
How can I help you? > 

We can see that the timers keep running while the prompt is waiting for input, so we know that our event loop is turning in the background. That would be amazing news if it was doing something useful.

How can I help you? > exit
me@compy386:~ 

TL;DR: AnyEvent::ReadLine::Gnu is just fine

There are some notes in the docs suggesting that readline might block and stall your event loop, leading to your program freezing or deadlocking.

Those things are easy enough to deal with but they mostly end up with you doing your work in another process/thread.

Exercises for the reader

1. Multiple timers

If you're feeling frisky you could patch this script to support multiple concurrent timers.

How can I help you? > wait 10
How can I help you? > wait 15

9 or so seconds pass:

The first timer fired!
How can I help you? >

5 more seconds pass:

The second timer fired!
How can I help you? >

2. Implement task management via a jobs commnad

It will take extra book keeping to keep track of the timers that are running, when they started and when they're due to complete. Carefully stashing background information a background_tasks attribute.

How can I help you? > jobs
    [1] wait 5     -  1 seconds left
    [2] wait 15    - 11 seconds left

Depending on the background task you could even ->send on the cv to stop the task.

3. Actually doing useful things in the background

The shell could trigger multiple jobs of some type, even to run a bunch of things at the same time with AnyEvent::HTTP or AnyEvent::Run. You could even control a farm with AnyEvent::MP.

Welcome


Welcome to your new blog.