An Optimization Primer: DBIx::Perlish

Anton Berezin <tobez@tobez.org>

Pisa, August 2010

The talk is about three things

Using Perl as a DSL for DB access

Can we replace SQL with Perl?

Basics of DBIx::Perlish

Perl code:

my $name = "tobez";
my $u = db_fetch {
my $t : users;
$t->name eq $name;

return $t->id, $t->password;
};
print "$name has an id $u->{id}\n";

SQL generated:

SELECT id, password FROM users where name = ?

What is going on? &-prototype

db_fetch {
my $t : users;
$t->name eq $name;

return $t->id, $t->password;
};

The db_fetch prototype:

sub db_fetch (&)

What is going on? Query sub.

db_fetch(sub {
my $t : users;
$t->name eq $name;

return $t->id, $t->password;
});

When db_fetch is called, it gets the query sub as its only parameter.

To exec, or not to exec?

sub something {
$_[0]->(1, 2, 3);
}
something(sub { print "@_\n" });

Prints

1 2 3

To exec, or not to exec?

sub something {
my ($sub, $flag) = @_;
if ($flag) {
$sub->(1, 2, 3);
}
}
something(sub { print "@_\n" }, 0);

Closure not executed, thus no output.

What else?

What other interesting things we can do with the sub, apart from executing it or not executing it?

Why, parse it, of course!

use B::Concise;

sub something {
my $sub = shift;
my $walker = B::Concise::compile($sub);
# my $walker = B::Concise::compile("-terse", $sub);
$walker->();
}

something(sub {
my $p = shift;
return $p * 2
});

Sub optree (concise)

d  <1> leavesub[1 ref] K/REFC,1 ->(end)
-     <@> lineseq KP ->d
1        <;> nextstate(main -5 t.pl:10) v ->2
6        <2> sassign vKS/2 ->7
4           <1> shift sK/1 ->5
3              <1> rv2av[t2] sKRM/1 ->4
2                 <$> gv(*_) s ->3
5           <0> padsv[$p:-5,-4] sRM*/LVINTRO ->6
7        <;> nextstate(main -4 t.pl:10) v ->8
c        <@> return K ->d
8           <0> pushmark s ->9
b           <2> multiply[t3] sK/2 ->c
9              <0> padsv[$p:-5,-4] s ->a
a              <$> const(IV 2) s ->b
sub {
my $p = shift;
return $p * 2
}

Sub optree (terse)

UNOP (0x2830c740) leavesub [1] 
    LISTOP (0x2830c680) lineseq 
        COP (0x28302180) nextstate 
        BINOP (0x2830c660) sassign 
            UNOP (0x2830c640) shift 
                UNOP (0x2830c620) rv2av [2] 
                    SVOP (0x2830c600) gv  GV (0x81039bc) *_ 
            OP (0x2830c5e0) padsv [1] 
        COP (0x283021c0) nextstate 
        LISTOP (0x2830c700) return 
            OP (0x2830c720) pushmark 
            BINOP (0x2830c6e0) multiply [3] 
                OP (0x2830c6a0) padsv [1] 
                SVOP (0x2830c6c0) const  IV (0x8183330) 2 
sub {
my $p = shift;
return $p * 2
}

Perl as a DSL

SQL meaning

# we are dealing with table "users"
my $t : users;

# WHERE users.name = 'tobez'
$t->name eq "tobez";

# SELECT users.id, users.password FROM users
return $t->id, $t->password;

Imagination is the limit.

Extend operators

my $t : elements;

# WHERE name in ('Thorium', 'Uranium', 'Plutonium')
$t->name <- ["Thorium", "Uranium", "Plutonium"];

return $t->n_p;

This is actually a less than then negate.

But we can interpret it as we want.

Extend interpolation

my $t : users;

return "$t->fname $t->lname";
SELECT fname || ' ' || lname FROM users

or, with MySQL

SELECT concat(fname, ' ', lname) FROM users

Hey, methods interpolate in a string!

Do "conditional compilation"

my $id = 42;
for my $cond (0,1) {
db_fetch {
my $t : users;
$t->id == $id if $cond;
};
}
SELECT * FROM users;                  -- $cond == 0
SELECT * FROM users WHERE $cond = 42; -- $cond == 1

Required skill level

It's all pattern recognition

?

A monkey with imagination A lion with teeth

Differences between objects

?

A lion with teeth A sleeping lion

It's easier with optrees

use B::Concise;
our $our;
my $my;

B::Concise::compile("-terse", sub {
return $our;
})->();

B::Concise::compile("-terse", sub {
return $my;
})->();

It's easier with optrees

UNOP (0x2830c480) leavesub [1] 
    LISTOP (0x2830c460) lineseq 
        COP (0x28302140) nextstate 
        LISTOP (0x2830c420) return 
            OP (0x2830c440) pushmark 
            UNOP (0x2830c400) null [15] 
                SVOP (0x2830c3e0) gvsv  GV (0x81170d8) *our 

UNOP (0x2830c6e0) leavesub [2] 
    LISTOP (0x2830c6c0) lineseq 
        COP (0x283021c0) nextstate 
        LISTOP (0x2830c680) return 
            OP (0x2830c6a0) pushmark 
            OP (0x2830c660) padsv [1] 

It's easier with optrees

use B::Concise;
our ($x, $y);

B::Concise::compile("-terse", sub {
$x + $y;
})->();

B::Concise::compile("-terse", sub {
$x - $y;
})->();

It's easier with optrees

UNOP (0x2830c500) leavesub [1] 
    LISTOP (0x2830c4e0) lineseq 
        COP (0x28302100) nextstate 
        BINOP (0x2830c4c0) add [3] 
            UNOP (0x2830c460) null [15] 
                SVOP (0x2830c440) gvsv  GV (0x81170d8) *x 
            UNOP (0x2830c4a0) null [15] 
                SVOP (0x2830c480) gvsv  GV (0x8117114) *y 

UNOP (0x2830c7a0) leavesub [1] 
    LISTOP (0x2830c780) lineseq 
        COP (0x28302180) nextstate 
        BINOP (0x2830c760) subtract [3] 
            UNOP (0x2830c700) null [15] 
                SVOP (0x2830c6e0) gvsv  GV (0x81170d8) *x 
            UNOP (0x2830c740) null [15] 
                SVOP (0x2830c720) gvsv  GV (0x8117114) *y 

The B module

use B;

my $x;

say ref B::svref_2object(\42); # B::IV
say ref B::svref_2object(\3.1415962); # B::NV
say ref B::svref_2object(\"hello"); # B::PV
say ref B::svref_2object(\$x); # B::NULL
say ref B::svref_2object([]); # B::AV
say ref B::svref_2object({}); # B::HV
say ref B::svref_2object(sub { 2 }); # B::CV

Full parsing

B::Concise::compile("-terse", sub { 2 })->();
UNOP (0x2830c3a0) leavesub [1] 
    LISTOP (0x2830c380) lineseq 
        COP (0x283020c0) nextstate 
        SVOP (0x2830c360) const  IV (0x81170b0) 2 

Getting to the optree

use B;

my $cv = B::svref_2object(sub { 2 });
my $root = $cv->ROOT;
say ref $root, ": ", $root->name;
my $seq = $root->first;
say " ", ref $seq, ": ", $seq->name;
my $count = $seq->children;
my $op = $seq->first;
say " ", ref $op, ": ", $op->name;
while (--$count) {
$op = $op->sibling;
say " ", ref $op, ": ", $op->name;
}

Almost as good job as B::Concise

UNOP (0x2830c3a0) leavesub [1] 
    LISTOP (0x2830c380) lineseq 
        COP (0x283020c0) nextstate 
        SVOP (0x2830c360) const  IV (0x81170b0) 2 
B::UNOP: leavesub
  B::LISTOP: lineseq
    B::COP: nextstate
    B::SVOP: const

But what about the value of the constant?

Getting to a constant

my $sv = $op->sv;
say ref $sv;
say ${$sv->object_2svref};
B::IV
2

What about variables?

my $x = 42;
B::Concise::compile("-terse", sub { $x })->();
UNOP (0x28504fa0) leavesub [2] 
    LISTOP (0x28504f60) lineseq 
        COP (0x284ed540) nextstate 
        OP (0x28504f20) padsv [1] 

Instead of SVOP "const" we have an OP "padsv".

Working with padlists

Suppose $cv holds B::CV and $op holds B::OP "padsv".

my $padlist = [$cv->PADLIST->ARRAY];
my $i = $op->targ;
my $v = $padlist->[1]->ARRAYelt($i);
my $ref = $v->object_2svref;
say $$ref;

This prints 42, as expected.

More complex values

my $x = { lang => { perl => [ 5, 6 ] } };
B::Concise::compile("-terse", sub { $x->{lang}{perl}[0] })->();
BINOP (0x28516ce0) aelem 
  UNOP (0x28516cc0) rv2av [4] 
    BINOP (0x28513520) helem 
      UNOP (0x28513500) rv2hv [3] 
        BINOP (0x285134a0) helem 
          UNOP (0x28513480) rv2hv [2] 
            OP (0x28513400) padsv [1] 
          SVOP (0x28513440) const  PV (0x284455f0) "lang" 
      SVOP (0x285134e0) const  PV (0x28445550) "perl" 
  SVOP (0x28516ca0) const  IV (0x2846d304) 0 

Not quite so easy

Not quite so easy

Remember the sub { 2 } example?

my $sv = $op->sv;
say ref $sv;
say ${$sv->object_2svref};
B::SPECIAL
Can't locate object method "object_2svref" via package "B::SPECIAL"

Threaded perl behaves differently!

Threaded perl constants

The constants are also stored on the padlist, so our previous handling of variables applies:

my $padlist = [$cv->PADLIST->ARRAY];
my $i = $op->targ;
my $v = $padlist->[1]->ARRAYelt($i);
my $ref = $v->object_2svref;
say $$ref;
2

The DBIx::Perlish parser

Slowness

Slowness

Speeding it up

Since a lot of time spent parsing, avoid parsing if possible.

Caching the results of the parsing is an obvious solution.

Choosing cache key

Changing variables

sub x
{
my $v = shift;
$v = 42 if $v < 10;
fake_db_fetch(sub { $v });
}
x(9); x(101); x(24);

fake_db_fetch

sub fake_db_fetch {
my $sub = shift;
my $cv = B::svref_2object($sub);
say "SRC LOC: ", $cv->FILE;
my $root = $cv->ROOT;
say "MEM LOC: ", $$root;
my $seq = $root->first;
my $count = $seq->children;
my $op = $seq->first;
while (--$count) {
$op = $op->sibling;
}
my $padlist = [$cv->PADLIST->ARRAY];
my $i = $op->targ;
my $v = $padlist->[1]->ARRAYelt($i);
say "B-val : $v";
my $ref = $v->object_2svref;
say "Val: $$ref";
}

experiment result

SRC LOC: ./t.pl
MEM LOC: 677144544
B-val : B::IV=SCALAR(0x2843ea50)
Val: 42
SRC LOC: ./t.pl
MEM LOC: 677144544
B-val : B::IV=SCALAR(0x2843e94c)
Val: 101
SRC LOC: ./t.pl
MEM LOC: 677144544
B-val : B::IV=SCALAR(0x2843e910)
Val: 24

Problem if cached

Solution to value problem

This introduces incompatible change

Previously:

for my $x (undef, 42) {
db_fetch {
my $t : users;
$t->id == $x;
};
}
SELECT * FROM users WHERE id is NULL;
SELECT * FROM users WHERE id = 42;

Now: no special treatment of undef

db_fetch {
my $t: users;
!defined $t->id;
};

Conditional compilation problem

my $id = 45;
for my $cond (0,1) {
db_fetch {
my $t : users;
$t->id == $id if $cond;
};
}
SELECT * FROM users;                  -- $cond == 0
SELECT * FROM users WHERE $cond = 42; -- $cond == 1

Cache key is the same!

Conditional compilation solution

Problem: variable table names

my $tname = "users";
db_fetch {
my $t : table = $tname;
return $t->id;
};

Solution: variable table names

Benchmarks

              Rate minimal_NC    minimal
minimal_NC  1535/s         --       -86%
minimal    11028/s       618%         --

             Rate complex_NC    complex
complex_NC  217/s         --       -97%
complex    7528/s      3367%         --

          Rate vars_NC    vars
vars_NC  213/s      --    -97%
vars    6399/s   2907%      --

            Rate vars10_NC    vars10
vars10_NC  205/s        --      -97%
vars10    6991/s     3303%        --

Other optimizations

Links

Thank you! ☺