Thursday, January 26, 2017

PL/Perl and Large PostgreSQL Databases

One of the topics discussed in the large database talk is the way we used PL/Perl to solve some data variety problems in terms of extracting data from structured text documents.

It is certainly possible to use other languages to do the same, but PL/Perl has an edge in a number of important ways.  PL/Perl is light-weight, flexible and fills this particular need better than any other language I have worked with.

While one of the considerations has often been knowledge of Perl in the team, PL/Perl has a number of specific reasons to recommend it:

  1. It is light-weight compared to PL/Java and many other languages
  2. It excels at processing text in general ways.
  3. It has extremely mature regular expression support
These features combine to create a procedural language for PostgreSQL which is particularly good at extracting data from structured text documents in the scientific space.  Structured text files are very common and being able to extract, for example, a publication date or other information from the file is very helpful.

Moreover when you mark your functions as immutable, you can index the output, and this is helpful when you want ordered records starting at a certain point.

So for example, suppose we want to be able to query on plasmid lines in UNIPROT documents but we have not set this up before we loaded the table.  We could easily create a PL/Perl function like:

CREATE OR REPLACE FUNCTION plasmid_lines(uniprot text) 
RETURNS text[]
LANGUAGE PLPERL IMMUTABLE AS
$$
use strict;
use warnings;
my ($uniprot) = @_;
my @lines = grep { /^OG\s+Plasmid/ } split /\n/ $uniprot;
return [ map {  my $l = $_; $l =~ s/^OG\s+Plasmid\s*//; $l } @lines ];
$$;


You could  then create a GIN index on the array elements:

CREATE INDEX uniprot_doc_plasmids ON uniprot_docs USING gin (plasmid_lines(doc));

Neat!

4 comments:

  1. ERP Software SolutionsERP Software Solutions gives the correct view of business information.

    ReplyDelete
  2. Hey! I use microsoft erp software at my company and it's never been easier to manage it! It helps you save money, time and a lot of effort. It is also cheaper than some other softwares I've worked with before this one. I would definitely recommend it to any business person!

    ReplyDelete
  3. Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much

    Installment loans
    Payday loans
    Title loans
    Cash Advances

    ReplyDelete
  4. Thank You For sharing the Post. It was very informative and useful.
    Best Oracle Training in Bangalore

    ReplyDelete