On this page ... Self restarting AnyEvent program | 🍆 | Templating Without a Template. | Run-length encoding | Some notes on git-config | Quickly installing requirements for perl scripts

Self restarting AnyEvent program


shells-post-icon

Wether you're running 1800 production web servers or a single VPS, there will come a time where you need to do some automated log cleanup, update some configs or monitor the state of some process or network connection.

So you write your little script, and dump it on the machine, via puppet or salt. You add an entry to your cron tab and never worry about it again.

Yeah. That's never the end - Some new requirements come up, the script needs to check new things or the config files move because of an os upgrade, the damn thing keeps emailing you warnings. We come to a point where we need to patch our script and deploy the changes. And make sure the old version stops, and the new versions is running. Over and over.

This is the point where it's handy to have your bundle of maintenance scripts running from an auto-updating git working copy on your boxes.

The aim here is to build a script that runs and does it's work while keeping an eye on your git repos' state.

Setup clone the repo install your depends

I'm going to assume you want to add this to an existing project:

me@compy386:~ git clone your-github-project /usr/local/yourthing    
me@compy386:~ cpanm -L /usr/local/yourthing/misc AnyEvent AnyEvent::HTTP

but if you want to just play with this without defacing one of your public projects:

me@compy386:~ 
    mkdir -p restarter/misc
    cd restarter
    git init 
    <paste code in a file called computer_program>
    git add .
    git commit -am "Let's get this party started"
    cpanm -L misc AnyEvent AnyEvent::HTTP

(maybe git ignore misc/*)

Supervise it, or just run it

Pop something like this in if your supervisor.conf, conf.d/computer_program.conf or whatever:

[program:computer_program]
command=/usr/local/yourthing/computer_program --loglevel=%(ENV_LOGLEVEL)s 
# You'll also want to configure log rotation if you're doing this long term.

And fire it up

me@compy386:~ supervisorctl start computer_program
me@compy386:~ supervisorctl status

And

me@compy386:~ ps ax | grep computer[_]program
computer_program 4b825dc642cb6eb9a060e54bf8d69288fbee4904

If you're not convinced, you can just run it directly

me@compy386:~ /usr/local/yourthing/computer_program
... It'll just run.

See it at work

... since the point is to restart the script each time the git repo its running from changes, The best way to to test this is to just commit locally and see the script notice the change.

Open up another terminal and commit a change to the script, or a anything in the directory.

me@compy386:/usr/local/yourthing echo "# never mind" >> computer_program
me@compy386:/usr/local/yourthing git commit -am "Helpful docs"

In the other tab we get one of these:

2017-05-02 23:15:11.163089 +0200 trace main: restarter: first run, starting with /usr/local/yourthing at 4ffdd6cc2b36b6fd69553202b7db1c7288927521
2017-05-02 23:17:41.198546 +0200 trace main: restarter[4ffdd6cc2b36b6fd69553202b7db1c7288927521 cmp 5f0e2d68cc2fef915215490db7ad5f6a52c75a45]: not taking more jobs

Depending on what you set $CHECK_EVERY to, the script will bail out, and if supervised it'll be restarted.

The script itself

#! /usr/bin/perl

#ABSTRACT: wireframe AnyEvent program that exits when code changes.

use warnings; use strict;
my $MY_NAME = $0;

use Cwd qw(cwd abs_path);
use File::Basename qw(dirname);
use File::Spec::Functions qw(catdir);

our $NEST;
use lib catdir( $NEST= dirname( abs_path($0) ), 'misc/lib/perl5');
# ...for use with cpanm -L misc AnyEvent

use AnyEvent;
use AnyEvent::Util qw/ run_cmd /;

my $computer_program = AnyEvent->condvar;

# simple self-restart setup 
my ( $CHECK_EVERY, $MURDER_AFTER) = ( 2.5*60, 150 );
my (
    $WRAP_IT_UP,   # flag, if true don't pick up any new jobs. (Timer for exit).
    $RUNNING_JOBS, # guards for currently running jobs
    $running_sha,  # string, sha1 of the last commit on this directory
    $restarter,    # guard, runs git log.
);

my $timer = AnyEvent->timer(
    after => 1,
    interval => $CHECK_EVERY,
    cb => sub {
        # this will kill the previous run by destroying the guard.
        $restarter = run_cmd [qw/ git log -1 --format=%H /, $NEST],
            '>' => sub {

            return unless @_; # If it doesn't run, just keep on keep'on.

            my $output = shift;
            chomp $output;

            if (not defined $running_sha) {
                $0 = join ' ', $MY_NAME, $running_sha=$output;
                AE::log trace=> "restarter: first run, starting with %s at %s", $NEST, $output;
                return;
            }

            my $we_good = (defined $running_sha and defined $output and $running_sha eq $output);
            AE::log trace=> "restarter[%s cmp %s]: %s",$running_sha, $output,
                                            $we_good ? 'seems fine' : 'not taking more jobs'
                                            ;
            # life is fine, keep going.
            return if $we_good;

            # nothing going on, so just bail out.
            if (not defined $RUNNING_JOBS or not @$RUNNING_JOBS) {
                AE::log info=> "restarter[%s cmp %s]: New software while idle. Bailing out to upgrade.",$running_sha, $output,
                return $computer_program->send
            }

            my $murder_in=$MURDER_AFTER;
            AE::log trace=> "restarter[%s cmp %s]: New software while Busy. Murder in %ss",$running_sha, $output, $murder_in;

            # busy. murder the jobs in a mintes time.
            $WRAP_IT_UP = AnyEvent->timer(after => $murder_in, cb => sub {
                AE::log info=> "restarter: %ss timer has passed, murdering %s jobs", $murder_in, 0+@{ $RUNNING_JOBS || [] };
                undef $_ for @{ $RUNNING_JOBS || [] };
                return $computer_program->send;
            });
        }
    }
);

# start the loop:
$computer_program->recv;
__END__

Exercises for the reader

When first working on a script like this one can easily get in trouble, the need to commit a change in order to to test the restart behaviour makes for an exciting life. Fortunately the script will also notice if you use git commit --amend to change the sha of the current commit.

This won't be your usual work flow, but it makes testing much more fun.

Obviously you can only amend a commit if you haven't pushed it yet.

Have the script do some stuff

Use AnyEvent::HTTP to poll some web service, and check the response for some value.

use AnyEvent::Log info to log when the value is present so you can see your code working.

If you add the right callbacks to your request you should be able to see the event being cancelled when you commit a change to the script.

Have your script check for changes on your remote

Since we're already polling git for changes, we could go one step further and poll for changes to the remote too.

This is relatively straight forward if you have a passwordless http remote like the ones github provide.

Here you'd want to use git remote update and git rev-parse origin/HEAD to check if your working copy and/or script are at the lastest SHA for your repo.

Spawn an ssh-agent for private repos

If our git repo is secret, like on gitlab or paid github accounts, you'll need to use a "deploy key" to fetch changes from the repo.

You can use run_cmd to start up another process running ssh-agent, once you know the socket path it choses, you can pass that info on in the environment to your GIT_SSH for any commands that interact with your remotes.

Wrap up

There are some other obvious options for getting your changes out:

  • Packaging your script and sticking it the package a mirror, restarting with hooks
  • Have your config management drop the new copy and do the init/rc.d/systemd dance
  • put your script in a base container, and rebuild your farm with every change

Each have their tradeoffs, and I'm a simple fellow. I like to push a change, and watch graphs to see my changes kick in. After all, if your change doesn't have an impact on the graphs, was it really a change?

Being able to push changes to a git repo and have them go live automatically takes a lot of the sting out of having to deploy daemons to large groups of production boxes, the more pessimistic reader will ask "But if I push a broken version, won't the script stop doing the git-fetch?"

Yep. That'll happen. You'd likely want one script that just does the git pull, and nothing else, while the rest of your scripts just watch for the change and restart when they change.

🍆


Thanks for coming.

Templating Without a Template.


You can generate html from perl with the things CGI.pm exports, but chances are that you might hire a designer and they'll hate you for it.

There are oodles of template engines on cpan, my personal favourite is HTML::Mason... Mason is the more the context of this post that the purpose though.

At some point in your life, you'll find yourself in mason land with a list of things, and the need to stick that in an array for your template:

sub get_records {
    my @records = ORM->search_for_related_things();
    $_->{link_tag} = sprintf '<a href="view_record.html?id=%s">%s</a>",
       $_->id, $_->title 
        for @records;
}

and then you can readily use that html in your template:

<nav><ul>
% for my $record ( $controller->get_records ) { 
    <li><% $record->{link_tag} %>
% }
</ul></nav>

Except you've just messed up escaping on both sets of interpolations above. The database shouldn't have entity encoded strings in it, nor should it have HTML tags of any sort, so these should be escaped.

There's also an injection for any id that contains a single quote character, users can break the url and inject javascript or even whole tags as as they like.

What if we don't put HTML in the array?

HTML::Element is part of the HTML::Tree distribution, and is used, surprisingly, for modelling elements in an HTML Document. The handy part is that it knows how to escape magic values in attributes and the like:

me@compy386:~ perl -MHTML::Element  -E '
    my $a=HTML::Element->new(a=> href=> "view_record.html?id=%s");
    $a->push_content( "This & that");
    say $a->as_HTML
    '

<a href="view_record.html?id=14">This &amp; that</a>

We just need to make url construction in there safe too...

me@compy386:~ perl -MURI -MHTML::Element -E '
    (my $b= URI->new("view_record.html"))->query_form(id=>14);
    my $a=HTML::Element->new(a=> href=> $b );
    $a->push_content( "This & that");
    say $a->as_HTML
    '

<a href="view_record.html?id=14">This &amp; that</a>

Seems ok, more objects representing our markup and less string concat'ing means we're less likely to get escaping wrong, let's try it on:

me@compy386:~ perl -MURI -MHTML::Element -E '
    (my $b= URI->new("view_record.html"))->query_form(id=>q/5" onload="alert(1)" "/);
    my $a=HTML::Element->new(a=> href=> $b );
    $a->push_content( "This & that");
    say $a->as_HTML
    '

We can see that URI did the helpful thing and escaped everything so the href stayed in the html tag

<a href="view_record.html?id=5%22+onload%3D%22alert(1)%22+%22">This &amp; that</a>

That's kinda annoying though

In order to fairly decide how annoying creating a URI object and passing it to HTML::Element object is, it seems only fair to do it the right in the other version...

You can't really do it right because you don't know the context that the {link_tag} will be used in, so we can just assume that the call site will correctly escape it, throwing out half of the bath water, and most of the baby:

sub get_records {
    my @records = ORM->search_for_related_things();
    $_->{link_tag} = sprintf '<a href="view_record.html?id=%s">%s</a>",
       encode_entities(url_escape($_->id)), encode_html_entities($_->title)
        for @records;
}

That's also the simplest case

Even though a link element is fairly straight forward, we can still see that it turns into a whole bundle of code if you do it by hand. If you're building anything more complicated than a link to some other place and a heading, you'll quickly find that you're trying to escape params from the request, data from your database, external APIs and from all kinds of trust levels. You'll be doing it in all sorts of different contexts in your document. Do you remember the escaping rules for javascript strings in a JSON response? How are they different from the rules in an inline <script> tag? How do css expressions work again? Are they different in an html attribute? Life is tough.

Having an object model is handy

Having an object model that represents your data allows you to store much more information than simply passing strings about, and that will in turn give you a better idea of how to correctly use your data and how to avoid security issues caused by mixing contexts and allowing user input to cross trust boundaries.

And best of all, you don't have to do all the escaping by hand.

Run-length encoding


I did a hacker-rank thing about run-length encoding, after hitting submit I realised there was more golfing to be done:

me@compy-386:~ echo "aaaabcdeeee" | perl -plE '
s/^(.)(\1*)/$o.= $1.($2&&length" $2");""/e while $_;
$_=$o
'
a4bcde4

It's nothing too magical:

  • -print each line after -Evaling the expression for each line of the file, with automatic handling of -line endings
  • for each iteration, we match the first character in $_, and the more of them (\1*)
    • with /e s/// will evaluate the replacement as an expression instead of treating it as a plain old string replacement
    • the expression appends the first match $1 (the first letter) and the length of the second match in $2 (the rest of the run) to $o
    • an empty $2 means the length isn't added because the challenge dictated that a single character is left alone ('a' instead of 'a1')
    • the "" is htere so the matched text is replaced with nothing, moving us closer to $_ being empty
  • the while loop continues until $_ is empty
  • once $_ is empty and all the text is processed, $o is assinged to $_ so it's printed.

Todo:

  • remove $o, by using print, or with fancy use of /g.
  • remove the while, I'm sure it can be done.
  • remove "" from the replace.

Standard perl-golf disclaimer

Please don't do this kind of thing in a production code base

Some notes on git-config


gitconfig-post-icon

I'm sure everyone has a .gitconfig with some handy aliases like:

  alias.ff=git pull --ff-only
  alias.rb=git pull --rebase

If you open up your .gitconfig it'll look something like:

[alias]
    ; too lazy to type these all in full
    root = !pwd
    ff   = pull --ff-only
    rb   = pull --rebase
    stat = status

It looks like an ini file, really.

The cool thing about .ini is that everyone has their own freak-show extensions to the simple ini format, which is really not much more complex than what's above, but has been extended in different directions with each implementation.

git-config obviously has its own rules about what's allowable, and how things are stored.

You're not allowed underscores:

me@compy386:~ $ git config -f ./example --add foo.bar_baz 1
error: invalid key: foo.bar_baz

me@compy386:~ $ git config -f ./example --add foo_bar.bar 1
error: invalid key: foo_bar.bar

So, unless your language lets you have - in method names, or you like snakeCase you're going to have to mangle the names after reading your config.

Your settings need to be in a section

me@compy386:~ $ git config -f example --add bar 1
error: key does not contain a section: bar

Doing this makes the config file much easier to deal with, and leaves you without the quagmire of nonsense dealing with "keys with no section go into the _ section"

Sections can have sub-sections

If you want to have configs for multiple named things of the same type:

me@compy386:~ $ git config -f ./example --add foo.thething.bar-baz 1

me@compy386:~ $ cat ./example 
[foo "thething"]
    bar-baz = 1

me@compy386:~ $ git config -list -f ./example 
foo.thething.bar-baz=1

Yep, you can have 2 levels of keys, and you end up with [first "second"] in your config. Neat!

This is used for branches and remotes among other things:

.git/config
[branch "master"]
    remote = origin
    merge = refs/heads/master

sections can have the same name as sub-sections

me@compy386:~ $ git config -f example --add foo.bar.baz 1

me@compy386:~ $ cat example 
[foo]
    bar = 1
[foo "bar"]
    baz = 1

me@compy386:~ $ git config -l -f example 
foo.bar=1
foo.bar.baz=1

If you're parsing this directly into a data structure you can end up with some fairly upsetting situations, like foo.bar becoming a hashmap when you don't expect it.

git-config - you might as well use it.

If you're building a tool that depends on git for a large portion of its job, you might as well use git-config too. It's a format that your users are likely already familiar with, and fits neatly into the ecosystem.

Quickly installing requirements for perl scripts


modules-post-icon

Sometimes you stumble across some perl that you want to run, but it's not neatly packaged as a cpan dist and doesn't have a nice list of modules to get it going.


Often it's an email or IRC conversation witha gist or pastebin link to some perl they're working on.

  • You can run the script a couple of times, installing missing depends as you go
  • You can ask the author
  • You can rely on your amazing tooling to get you the right versions of the modules.

Go get cpan-minus

It's the one after cpanplus, except it's lighter because it has fewer oddball features.

If you don't have cpanm you can bootstrap it from http://cpanmin.us with:

% curl -L https://cpanmin.us | perl - App::cpanminus

cpanm knows how to install cpanm.

me@compy386:~ cpanm $( 
    perl -nle '
    /use ([:\w]+) ([0-9.]+)/ and $d{$1} = $2  
    }{
    printf "%s@%s ", $_, $d{$_} for keys %d' -- shell-only  
)

On My machine it just prints this:

Object::Tiny::RW is up to date. (1.07)
AnyEvent::ReadLine::Gnu is up to date. (1.0)
AnyEvent is up to date. (7.11)

The one-liner produces this:

 AnyEvent@7.11 Object::Tiny::RW@1.07 AnyEvent::ReadLine::Gnu@1.0 

We just match use Letters::And::Colons space numbers and stash them.

If you replace cpanm with echo, you'll see that we print out the module names and versions in the form Object::Tiny::RW@1.07, the format cpanm likes.