Saturday, July 30, 2016

Notes on Security, Separation of Concerns and CVE-2016-1238 (Full Disclosure)

A cardinal rule of software security is that, when faced with a problem, make sure you fully understand it before implementing a fix.  A cardinal rule of general library design is that heuristic approaches to deciding whether something is a problem really should be disfavored.  These lessons were driven home when I ended up spending a lot of time debugging problems caused by a recent Debian fix for the CVE in the title.  Sure everyone messes up sometimes, and so this isn't really condemnation of Debian as a project but their handling of this particular CVE is a pretty good example of what not to do.

In this article I am going to discuss actual exploits.  Full disclosure has been a part of the LedgerSMB security culture since we started and discussion of exploits in this case provide administrators with real chances to secure their systems, as well as distro maintainers, Perl developers etc to write more secure software.  Recommendations will be further given at the end regarding improving the security of Perl as a programming language.

Part of the problem in this case is that the CVE is poorly scoped but CVE's are often poorly scoped and it is important for developers to work with security researchers to understand a problem, understand the implications of different approaches to fix it and so forth.  It is very easy to get in a view that "this must be fixed right now" but all too often (as here) a shallow fix does not completely resolve an issue and causes more problems than it resolves.

The Problem (with exploits)


Perl's system of module inclusion (and other operations) looks for Perl modules in the current directory after exhausting other directories.  Technically this is optional but most UNIX and Linux distributions have this behavior.  On the whole it is bad practice as well, which is why it is not by the default behavior of shells like bash.  But a lot of software depends on this (with some legitimate use cases) and so changing it is problematic.

Perl programs are also often complex and optional dependencies which may or may not exist on a system.  If those do not otherwise exist in the system Perl directories but exist in the current working directory, then these may be loaded from the current working directory.  Note that this is not actually limited to the current working directory, and Perl could load the files from all kinds of places, users-specified or not.

So when one is running a Perl program in a world-writeable location, there is the opportunity for another user to put code there that may be picked up by the Perl interpreter and executed.  While the CVE is limited to implicit inclusion of the current working directory, the problem is actually quite a bit broader than that.  Include paths can be specified on the command line and if any of them are world-writeable, then variations of the same attacks are possible.

Some programs, of course, are intended to run arbitrary Perl code.  The test harness programs are good examples of this.  Special attention will be given to the ways test harness programs can be exploited here.

These features come together to create opportunities for exploits in multi-user systems which administrators need to be aware of and take immediate steps to prevent.  In my view there are a few important and needed features in Perl as well.

A simple exploit:

Create the following files in a safe directory:

t/01-security.t, contents:

use Test::More;
require bar;
eval { require foo };

plan skip_all => 'nothing to do';


lib/bar.pm contents:

use 5.010;
warn "this is ok";


./foo.pm, contents:

use 5.010;
warn "Haxored!";


now run:

prove -Ilib t/01-security.t

Now, what happens here is that the optional requirement of foo.pm in the test script gets resolved to the one that happens to be in your current working directory.  If that directory were world writeable, then someone could add that file and it would be run when you run your test cases.

Now, it turns out that this is not a vulnerability with prove.  Because prove runs Perl in a separate process and parses the output, eliminating the resolution inside prove itself has no effect.  What this means is that the directory where you run something like prove can really matter and if you happen to be in a world writeable directory when you run it (or other Perl programs) you run the risk of including unintended code supplied by other users.  Not good.  None of the proposed fixes address the full scope of this problem either.  Note that if any directory in @INC is world-writeable, a security problem exists.  And because these can be specified in the Perl command line, this is far more of a root problem than the mere inclusion of the current working directory.

Security Considerations for System Administrators and Software Developers


All exploits of this sort can be prevented even without the recommendations being followed in the proper fixes section.  System administrators should:


  1. Make sure that all directories in @INC other than the current working directory are properly secured against write access (this is a no brainer, but is worth repeating)
  2. Programs such as test harnesses which execute arbitrary Perl code should ONLY be run in properly secured directories, and the only time prove should ever be run as root is when installing (as root) modules from cpan.
  3. Scripts intended to be run in untrusted directories should be audited and one should ensure (and add if it is missing) the following line:  no lib '.';

Software developers should:

  1. think carefully about where a script might be executed.  If it is intended to be run on directories of user-supplied files, then include the  no lib '.';  (This does not apply to test harnesses and other programs which execute arbitrary Perl programs).
  2. Module maintainers should probably avoid using optional config modules to do configuration.  These optional configuration modules provide standard points of attack.  Use non-executable configuration files instead or modules which contain interfaces for a programmer to change the configuration.


What is wrong with the proposed fixes


The approach recommended by the reporter of the problem is to make modules exclude an implicit current working directory when loading optional dependencies.  This, as I will show, raises very serious separation of concerns problems and in Debian's case includes one serious bug which is not obvious from outside.  Moreover it doesn't address the problems caused by running test harnesses and the like in untrusted directories.  So one gets a *serious* problem with very little real security benefit.

If you are reading through the diff linked to above, you will note it is basically boilerplate that localizes @INC and removes the last entry if it is equal to a single dot.  This breaks base.pm badly because inheritance in Perl no longer follows @INC the way use does.  Without this patch the following are almost equivalent with the exception that the latter also runs the module's import() routine:

use base 'Myclass';

and

use Myclass;
use base 'Myclass';

But with this patch, the latter works and the former does not unless you specify -MMyclass in the Perl command line.  This occurs because someone is thinking technically about an issue without comprehending that this isn't an optional dependency in base and therefore the problem doesn't apply there.  But the problem is not quickly evident in the diff, nor is it evident when trying to fix this in this way on the module level.  It breaks the software contract badly, and does so for no benefit.

As a general rule, modules should be expected to act sanely when it comes to their own internal structure, but playing with @INC violates basic separation of concerns and a system that cannot be readily understood cannot be readily secured (which is why this is a real issue in the first place -- nobody understands all the optional dependencies of all the dependencies of every script on their system).

Recommendations for proper fixes


There are several things that Perl and Linux distros can do to provide proper fixes to this sort of problem.  These approaches do not violate the issues of separation of concerns.  The first and most important is to provide an option to globally remove '.' from @INC on a per-system basis.  This is one of the things that Debian did right in their reaction to this.  Providing tools for administrators to secure their systems is a good thing.

A second thing is that Perl already has a number of enhanced modes for dealing with security concerns and adding another would be a good idea.  In fact, this could probably be done as a pragma which would:

  1. On load, check to see that directories in @INC are not world-writeable -- if they are, remove them from @INC and warn, and
  2. when using lib, check to see if the directory is world-writeable and if it is die hard.
But making this a module's responsibility to care what @INC says?  That's a recipe for problems and not of solutions both security and otherwise.

Friday, July 29, 2016

What is coming in LedgerSMB 1.5

LedgerSMB 1.5 rc1 is around the corner.   I figure it is time for a very short list of major improvements:


  1.  A one-page app design which provides better responsiveness and testability
  2. Speaking of testability, we now have selenium-based bdd-tests as well (and a test coverage starting to approach reasonable -- remember, we started with no tests).
  3. We have spun off our database access framework as PGObject on CPAN.  This is an anti-ORM, basically a service locator framework for stored proceures.
  4. Quantity discounts
  5. Many stored procedures moved from PL/PGSQL to plain SQL for better type checking
  6. Full support for templates transparently stored in the PostgreSQL database



Sunday, March 20, 2016

When PostgreSQL Doesn't Scale Well Enough

The largest database I have ever worked on will eventually, it looks like, be moved off PostgreSQL.  The reason is that PostgreSQL doesn't scale well enough.  I am writing here however because the limitations are so extreme that it ought to give plenty of ammunition for those who think databases don't scale.

The current database size is 10TB and doubling every year.  The main portions of the application have no natural partition criteria.  The largest table currently is 5TB and the fastest growing portion of the application.

10TB is quite manageable.  20TB will still be manageable.  By 40TB we will need a bigger server.  But in 5 years we will be at 320 TB and so the future does not look very good for staying with PostgreSQL.

I looked at Postgres-XL and that would be useful if we had good partitioning criteria but that is not the case here.

But how many cases are there like this?  Not too many.

EDIT:  It seems I was misunderstood.  This is not complaining that PostgreSQL doesn't scale well  It is about a case that is outside of all reasonable limits.

Part of the reason for writing this is that I hear people complain that the RDBMS model breaks down at 1TB which is hogwash.  We are facing problems as we look towards 100TB.  Additionally I think that PostgreSQL would handle 100TB fine in many other cases, but not in ours.  PostgreSQL at 10, 20 or 50TB is quite usable even in cases where big tables have no adequate partitioning limit (needed to avoid running out of page counters), and at 100TB in most other cases I would expect it to be a great database system.  But the sorts of problems we will hit by 100TB will be compounded by the exponential growth of the data (figure within 8 years we expect to be at 1.3PB).  So the only solution really is to move to a big data platform.

Sunday, February 21, 2016

A couple annoyances (and solutions) regarding partitioned tables

In one of my projects we had an issue where a large table that was under huge transactional load was having trouble with autovacuum not keeping up.  The problem was that the table sometimes held over half a billion records, added and deleted millions of records a day, and that since most of these occurred at the heads of various indexes, autovacuum was just not fast enough.

So we decided to partition the table into around 50 pieces in order to allow autovacuum to achieve a bit better parallelism in managing the data.  This helped to some extent.  But partitioning is a rare solution for rare problems and comes with unexpected costs.   Interestingly most of our problems have been ORM-related.  Here are some we ran into and their solutions (spoiler:  at the end of the day, effectively, we stopped using an ORM on these tables).  At the end of the day, throughput on these tables was increased around 10-fold, and db load cut by about 90%.

Annoyance 1:  Redirection and ORM transparency


The first problem we had was getting DBIx::Class to work with the partitioned table.  The solution was to add another view in between which did the redirection of inserts, updates, and deletes.  This also allowed us to go through the ORM for inserts (we still do) without the cross-locking issues below being a problem.

Annoyance 2:  Cross-locking and exclusion constraints


A second major problem is that autovacuum can only free up space when it gets an exclusive lock and if any queries are going through the parent table, then you get constraint exclusion coming into play.  The problem here is that constraint exclusion takes out a relatively non-invasive lock on every table at planning time which means you cannot even plan to select a row from one partition if another partition is locked, if you are going through the parent table.

The obvious solution here is not to go through the parent table, but the ORM doesn't support that so we had to drop to SQL.  It also took us about 6 months to find and fix.

Annoyance 3:  Constraint exclusion doesn't always do what you expect it to!


One day we had a very slow running straight-forward query that should have been able to resolve quickly on an index scan on one of the partitions. However, because the constraint criteria was being brought in via a subquery, it was not available at plan time, so it was falling back on a sequential scan through another large partition.  Ouch......  Found the query and fixed it.

Annoyance 4:  Solving some performance problems puts more stress on the next bottleneck


The result of the initial success was increased db concurrency, which was great until it became clear our selection of rows to process and delete was leading to lots of indexes having huge numbers of dead tuples at their heads.  This meant that selecting rows actually became slower than before.  So we had to go back and engineer a new selection algorithm to avoid this problem....

Unrelated Annoyance:  Long running transactions causing autovacuum headaches


An interesting unrelated issue we had was the fact that at the time, we had transactions that would sometimes remain open for a week.  While the partitions directly affected were small, the problem is that autovacuum cannot clear tuples that are invalidated since the oldest transaction started, so higher processing throughput partitions were adversely affected.  After significant effort, we got the worst offenders corrected and now the longest running transactions take just over a day.  This is usually sufficient depending on the load of the system (but sometimes the duration spikes to 18 hours).

Was the partitioning worth it?  Definitely!  However it was a bit of a long road to get there

Wednesday, February 10, 2016

Why Commons Should Not Have Ideological Litmus Tests

This will likely be my last post on this topic.  I would like to revive this blog on technical rather than ideological issues but there seems like a real effort to force ideology in some cases.  I don't address this in terms of specific rights, but in terms of community function and I have a few more things to say on this topic before I return to purely technical questions.

I am also going to say at the outset that LedgerSMB adopted the Ubuntu Code of Conduct very early (thanks to the suggestion of Joshua Drake) and this was a very good choice for our community.  The code of conduct provides a reminder for contributors, users, participants, and leadership alike to be civil and responsible in our dealings around the commons we create.  Our experience is we have had a very good and civil community with contributions from every walk of life and a wide range of political and cultural viewpoints.  I see  this as an unqualified success.

Lately I have seen an increasing effort to codify a sort of political orthodoxy around open source participation.  The rationale is usually about trying to make people feel safe in a community, but these are usually culture war issues so invariably the goal is to exclude those with specific political viewpoints (most of the world) from full participation, or at least silence them in public life.  I see this as extremely dangerous.

On the Economic Nature of Open Source


Open source software is economically very different from the sorts of software developed by large software houses.  The dynamics are different in terms of the sort of investment taken on, and the returns are different.  This is particularly true for community projects like PostgreSQL and LedgerSMB, but it is true to a lesser extent even for corporate projects like MySQL.  The economic implications thus are very different.

With proprietary software, the software houses build the software and absorb the costs for doing so, and then later find ways to monetize that effort.  In open source, that is one strategy among many but software is built as a community and in some sense collectively owned (see more on the question of ownerership below).

So with proprietary software, you may have limited ownership over the software, and this will be particularly limited when it comes to the use in economic production (software licenses, particularly for server software, are often written to demand additional fees for many connections etc).

Like the fields and pastures before enclosure, open source software is an economic commons we can all use in economic production.  We can all take the common software and apply it to our communities, creating value in those areas we value.  And we don't all have to share the same values to do it.  But it often feeds our families and more.

But acting as a community has certain requirements.  We have to treat eachother with humanity generally.  That doesn't mean we have to agree on everything but it does mean that some degree of civility must be maintained and cultivated by those who have taken on that power in open source projects.

On the Nature of Economic Production, Ownership and Power (Functionally Defined)


I am going to start by defining some terms here because I am using these terms in functional rather than formal ways.

Economic Production:  Like all organisms we survive by transforming our environment and making it more conducive to our ability to live and thrive.  In the interpersonal setting, we would call this economic production.  Note that understood in this way, this is a very broad definition and includes everything from cooking dinner for one's family to helping people work together.  Some of this may be difficult to value but it can (what is the difference between eating out and eating at home?  How much does a manager contribute to success through coordination?).

Ownership:  Defining ownership in functional rather than formal terms is interesting.  It basically means the right to use and direct usage of something.  Seen in this way, ownership is rarely all or nothing.  Economic ownership is the right to utilize a resource in economic production.  The extent to which one is restricted in economic production using a piece of software the less one owns it, so CAL requirements in commercial software and anti-TIVOization clauses in the GPL v3 are both restrictions on functional ownership.

Economic Power:  Economic power is the power to direct or restrict economic production.  Since economic production is required for human life, economic power is power over life itself.  In an economic order dominated by corporations, corporations control every aspect of our lives.  In places where the state has taken over from the corporations, the state takes over this as well.  But such power is rarely complete because not all economic production can be centrally controlled.

I am going to come back to these below because my hesitation on kicking people out of the community due to ideological disagreements (no matter how wrong one side may seem to be) have to do with this fear of abuse of economic power.


On Meritocracy (and what should replace it)


Meritocracy is an idea popularized by Eric Raymond, that power in a community should be given to technical merit.  In short, one should judge the code, not the person.  The idea has obvious appeal and is on the surface hyper-inclusive.  We don't have to care about anything regarding each other other than quality of code.  There is room for everyone.

More recently there has been push-back in some corners against the idea of meritocracy.  This push-back comes from a number of places, but what they have in common is questioning how inclusive it really is.

The most popular concern is that meritocracy suggests that we should tolerate people who actively make the community less welcoming, particularly for underrepresented groups. and therefore meritocracy becomes a cover for excluding the same groups who are otherwise excluded in other social dimensions, that the means of exclusion differs but who is excluded might not.

There is something to be said for the above concern, but advocates have often suggested that any nexus between community and hostile ideas is sufficient to raise a problem and therefore when an Italian Catholic expresses a view of gender based on his religion on Twitter, people not even involved in the project seek his removal from it on the grounds that the ideas are toxic.  For reasons that will become clear, that is vast overreach, and a legitimate complaint is thus made toxic by the actions of those who promote it.  And similarly toxic are the efforts by some to use social category to insist that their code should be included just to show a welcoming atmosphere.

A larger problem with meritocracy though is the way it sets up open source communities to be unbalanced, ruled by technical merit and thus not able to attract the other sorts of contributions needed to make most software successful.  In a community where technical merit is the measure by which we are judged, non-technical contributions are systematically devalued and undervalued.  How many open source communities produce software which is poorly documented and without a lot of attention to user interface?  If you devalue the efforts at documentation and UI design, how will you produce software which really meets people's needs?  If you don't value the business analysts and technical writers, how will you create economic opportunities for them in your community?  If you don't value them, how will you leverage their presence to deliver value to your own customers?  You can't if your values are skewed.

The successor to meritocracy should be economic communitarianism, i.e. the recognition that what is good for the community is economically good for all its members.  Rather than technical merit, the measure of a contribution and a contributor ought to be the value that a contribution brings the community.    Some of those will be highly technical but some will not.  Sometimes a very ordinary contribution that anyone could offer will turn the tide because only one person was brave enough to do it, or had the vision to see it as necessary.  Just because those are not technical does not mean that they are not valuable or should not be deeply respected.  I would argue that in many ways the most successful open source communities are the ones which have effectively interpreted meritocracy loosely as economic communitarianism.

On Feeling Safe in the Community


Let's face it  People need to feel safe and secure in the community regarding their physical safety and economic interests.  Is there any disagreement on this point?  If there is, please comment below.  But the community cannot be responsible for how someone feels, only in making sure that people are objectively physically and economically secure within it.  If someone feels unsafe in attending conferences, community members can help address security concerns and if someone severely misbehaves in community space, then that has to be dealt with for the good of everyone.

I don't think the proponents of ideological safety measures have really thought things through entirely.  The world is a big place and it doesn't afford people ideological safety unless they don't go out and work with people they disagree with.  As soon as you go across an international border, disagreements will spring up everywhere and if you aren't comfortable with this then interacting on global projects is probably not for you.

Worse, when it comes to conduct outside of community circles, those in power in the community cannot really act constructively most of the time.  We don't have intimate knowledge and even if we do, our viewpoints have to be larger than the current conflict.

On "Cultural Relativism:" A welcoming community for all?


One of the points I have heard over and over in discussions regarding community codes of conduct is that welcoming people regardless of viewpoint (particularly on issues like abortion, sexuality, etc) is cultural relativism and thus not acceptable.  I guess the question is not acceptable to whom?  And do we really want an ideological orthodoxy on every culture war topic to be a part of an open source project?  Most people I have met do not want this.

But the overall question I have for people who push culture war codes of conduct is "when you say a welcoming community for all, do you really mean it?  Or do you just mean for everyone you agree with?  What if the majority changes their minds?"

In the end, as I will show below, trying to enforce an ideological orthodoxy in this way does not bring marginal groups into the community but necessary forces a choice of which marginal groups to further exclude.  I don't think that is a good choice and I will go on record and say it is a choice I will steadfastly refuse to make.

A Hypothetical


Ideology is shaped by culture, and ideology of sexuality is shaped by family structures, so consequently where family structures are different, views on sexuality will be also.

So suppose someone on a community email list includes a pro-same-sex marriage email signature, something like:

"Marriage is an institution for the benefit of the spouses, not [to] bind parents to their children" -- Ted Olson, arguing for a right to same-sex marraige before the United States Supreme Court.

So a socially conservative software developer from southern India complaints to the core committee saying that this is an attack on his culture, saying that traditional Indian marriages are not real marriages.  Now, I assume most people would agree that it would be best for the core committee not to insist that the email signature be changed for someone to continue to participate.  So with such a decision, suppose the complainant changes his signature instead to read:

"If mutual consent makes a sexual act moral, whether within marriage or without, and, by parity of reasoning, even between members of the same sex, the whole basis of sexual morality is gone and nothing but misery and defect awaits the youth of the country… " -- Mohandas Gandhi

Now the first person decries the signature as homophobic and demands the Indian fellow be thrown off the email list.  And the community, if it has decided to follow the effort at ideological safety has to resolve the issue.  Which group to exclude?  The sexual minority?  Or the group marginalized through a history of being on the business end of colonialism?  And if one chooses the latter, then what does that say about the state of the world?  Should Indians, Malaysians, Catholics, etc. band together to fork a competing project?  Is that worth it as a cost?  Doesn't that hurt everyone?

On Excluding People from the Commons


In my experience, excluding people from the commons carries with it massive cost, and this is a good thing because it keeps economic power from being abused.  I have watched the impact first hand.  LedgerSMB would not even exist if this weren't an issue with SQL-Ledger.  That we are now the only real living fork of SQL-Ledger and far more active than the project we forked from is a testament to the cost.

Of course in that case the issue was economic competition and a developer who did not want to leverage community development to build his own business.  I was periodically excluded from SQL-Ledger mailing lists etc for building community documentation (he sold documentation).  Finally the fork happened beccause he wouldn't take security reports seriously.  And this is one of the reasons why I would like to push for an inclusive community.

But I also experienced economic ramifications from being excluded.  It was harder to find customers (again, the reason for exclusion was economic competition so that was the point).  In essence, I am deeply aware of the implications of kicking people out.

I have seen on email lists and tracker tickets the comparison of the goal of excluding people with problematic ideologies with McCarthyism.  The goal of McCarthyism was indeed similar, to make sure that if you had the wrong ideas you would be unable to continue a professional career.  I have had relatives who suffered because they defended the legal rights of the Communist Party during that time.  I am aware of cases where the government tried to take away their professional career (unsuccessfully).

Management of community is political and the cost of excluding someone is also political.  We already exist in some ways on the margins of the software industry.  Exclude too many people and you create your own nemesis.  That's what happened to SQL-Ledger and why LedgerSMB is successful today.

Notes on former FreeBSDGirl


One blog entry that comes from the other side of this issue is Randi Harper's piece on why she no longer will go to FreeBSD conferences and participate on IRC channels.   I am not familiar with the facts surrounding her complaints and frankly I don't have time to be so what the nature of her harassment complaint is, I will not be the judge.

There is however another side to the issue that is outside what she evidently has experience with, and that is the role of software maintainers in addressing the sorts of complaints she made.  Consequently I want to address that side and then discuss her main points at the bottom.

One thing to remember is that when people make accusations of bullying, harassment, etc. the people in charge are also the people with the least actual knowledge of what is going on.  Expecting justice from those in power in cases like this will lead, far more often than not, to feelings of betrayal.  This is not because of bad intentions but because of lack of knowledge.  This was one thing I learned navigating schoolyard bullies when I was growing up and we as project maintainers are in an even lower knowledge role than school administrators are.  Bullies are furthermore usually experts at navigating the system and take advantage of those who are not as politically adept, so the more enforcement you throw at the problem, the worse it gets.

So there is an idea that those in charge will stop people from treating eachother badly.  That has to stop because it isn't really possible (as reasonable as it sounds).  What we can do is keep the peace in community settings and that is about it.  One needs bottom up solutions, not top down ones.

So if someone came to me as a maintainer of a project alleging harassment on Twitter and demanding that an active committer be removed, that demand would probably go nowhere.  If political statements were mentioned, the response would be "do we want a political orthodoxy?"  Yet LedgerSMB has avoided these problems largely because, I think, we are a community of small businesses and therefore are used to working through disagreements and maybe because we are used to seeing these sorts of things as political.

Her main points though are worth reading and pondering.  In some areas she is perfectly right and in some areas dangerously wrong.

Randi is most right in noting that personal friction cannot be handled like a technical problem.  It is a political problem and needs to be handled as such.  I don't think official processes are the primary protection here, and planning doesn't get you very far, but things do need to be handled delicately.

Secondly, there is a difference between telling someone to stay quiet and telling someone not to be shouting publicly.   I think it is worth noting that if mediation is going to work then one cannot have people trying to undermine that in public, but people do need friends and family for support and so it is important to avoid the impression that one is insisting on total confidentiality.

Randi is also correct that how one deals with conflict is a key gauge of how healthy an open source community is.  Insisting that people be banished because of politically offensive viewpoints however does not strike me as healthy or constructive.  Insisting that people behave themselves in community spaces does.  In very rare cases it may be necessary to mediate cases that involve behavior outside that, but insisting on strict enforcement of some sort of a codified policy will not bring peace or prosperity.

More controversially I will point out that there is a point that Randi makes implicitly that is worth making explicit here, namely that there is a general tone-deafness to women's actual experiences in open source.  I think this is very valid.  I can remember a former colleague in LedgerSMB making a number of complaints about how women were treated in open source.  Her complaints included both unwanted sexual attention ("desperate geeks") and more actionably the fact that she was repeatedly asked how to attract more women to open source (she responded once on an IRC channel with "do you know how annoying that is?").  She ultimately moved on to other projects following a change in employment that moved LedgerSMB outside the scope of duties,  but one obvious lesson that those of us in open source can take from this is just to listen to complaints.  Many of these are not ones that policies can solve (you really want a policy aimed at telling people not to ask what needs to be done to attract more women to open source?) but if we listen, we can learn something.

One serious danger in the current push for more expansive codes of conduct is that it puts those who have the least knowledge in the greatest responsibility.  My view is that expansive codes of conduct, vesting greater power with maintainers over areas of political advocacy outside community fora will lead to greater, not less conflict.  So I am not keen in her proposed remedies.

How Codes of Conducts Should be Used


The final point I want to bring up here is how codes of conduct should be used.  These are not things which should be seen as pseudo-legal or process-oriented documents.  If you go this way, people will abuse the system.  It is better in my experience to vest responsibility with the maintainers in keeping the peace, not dispensing out justice, and to have codes of conduct aimed at the former, not the latter.  Justice is a thorny issue, one philosophers around the world have been arguing about for millennia with no clear resolution.

A major problem is the simple fact that perception and reality don't always coincide.  I was reminded of this controversy while reading an article in The Local about the New Years Eve sexual assaults, about work by a feminist scholar in Sweden to point out that actually men are more at risk from bodily violence than women are, and that men suffer disproportionately from crime but are the least likely to modify behavior to avoid being victimized.  The article is worth reading in light of the current issues.

So I think if one expects justice from a code of conduct, one expects too much.  If one expects fairness from a code of conduct, one expects too much.  If one expects peace and prosperity for all, then that may be attainable but that is not compatible with the idea that one has a right not to be confronted by people with dangerous ideologies.

Codes of conducts, used right, provide software maintainers with a valuable tool for keeping the peace.  Used wrong, they lead open source projects into ruin.  In the end, we have to be careful to be ideologically and culturally inclusive and that means that people cannot guarantee that they are safe from ideas they find threatening.

Tuesday, January 26, 2016

On Contributor Codes of Conduct and Social Justice


The PostgreSQL, Ruby, and PHP communities have all been considering codes of conduct for contributors. The LedgerSMB community already uses the Ubuntu Code of Conduct.  Because this addresses many projects, I am syndicating this further than where there are current issues.  This is not a technical post and it covers a wide range of very divisive issues for a very diverse audience.  I can only hope that the nuance I am trying to communicate comes across.

Brief History


A proximal cause seems to be an event referred to as "Opalgate" where an Italian individual who claimed to be a part of the Opal project made some unrelated tweets in an exchange about the politics of education and the question of how gender should be presented, and some people got offended and demanded his resignation (at least that is my reading of the Twitter exchange but I have been outside the US long enough to lose the context in which it would likely be read by an American in the US).  The details are linked below, but the core questions involve how much major contributors to projects need to keep from saying anything at all about divisive issues seems to be a recurring topic.  Moreover it is a legitimate one.

Like some of my blog posts, this goes into touchy territory.  I am discussing things which require a great deal of nuance.  Chances are, regardless of where you sit on some of these issues, you will be offended by things I say, but there are worse things than to be offended (one of them is never to be challenged by different viewpoints).

I write here as someone who has lived in a number of very different cultures and who can see perspectives on many of these issues which are not present in American political discourse  For this reason, I think it is important for me to share the concerns I see because otherwise open source software maintainers often don't have a perspective outside of Western countries, or even outside the US.

Of course as open source software maintainers we want everyone to feel safe and valued as members of the community.  But cultural tensions and ways of life do crop up and taking a position on these as a community will always do more harm than good.

Background Reading regarding Opalgate and the question of so-called "social justice warriors" in open source


It may seem strange to put a list of links for background reading near the start of an article, but I want to make sure that such material is available up front.  People can read about Opalgate here and the ongoing debate between various parties about it.  It's important background reading but somewhat peripheral to the overall problems involved.  It may or may not be the best example of the difficulties in running cross-cultural projects but it does highlight the difficulties that come in addressing diverse community bases, those which may have deep philosophical disagreements about things which people take very personally.

In the interest of full disclosure, I too worry that there is too much eagerness to liberate children from concepts of gender and too little thought about how this can and will be abused, and what the life costs for the children actually will be.  I believe that we must be human and humane to all but I am concerned that the US is going down a path that strikes me as anything but that in the long run.  That doesn't mean that the concerns of the trans community in the US should be ignored, but that doesn't mean they should be paramount either.  As communities we need to come together to solve problems not fight culture wars.

Twitter is not a medium which is conducive to thoughtful exchange so I also have to cut some slack.  Probably not the wisest medium to discuss controversial topics.  But people around the world have deep differences in views on major controversies.  My wife, for example, is far more opposed to abortion than I am, and having come to a deeper understanding of her culture, I don't disagree that in her cultural context, it is more harmful.  But that brings me to another problem, that many issues are contextual and we cannot see how others really are impacted by such changes, particularly when forced from the outside.

But my view doesn't matter everywhere.  It matters in my family, my discussions with people I know, and so forth.  But most of the world is not my responsibility nor should it be.  These are not entirely easy issues and there should be room for disagreement.

Is Open Source Political?


Caroline Ada Ehmke's basic argument is that open source is inherently political, that it seeks a positive change in the world, and therefore it should ally itself with others sharing the same drive to make the world a better place.  I think this viewpoint is misguided but because it is only half-wrong.

Aristotle noted that all human relationships are necessarily political.  The three he chose as primary in Politics is illustrative:  master and slave (we could update to boss and worker); husband and wife; and king and subject.  To Aristotle, the human being alone is incomplete.  We are our relationships and our politics follows from them.  While there has been an effort to separate the personal and the political in modern times, Feminist historians have kept this tradition alive and well.  A notion of the political grounded in humans as social animals is fundamentally more conducive to justice than cold, mechanical, highly engineered social machinery.  Moreover Aristotle notes that all communities are built on some concept of the good, that humans only want things that seem good to them and therefore we can assume that all groups seek a better world, but we don't always know which ones deliver, and that is the problem. 

Open source begins not with an ideology but with a conviction.  Not everybody shares the same conviction.  Not everyone participates in open source for the same reason.    But everyone has a reason, some conviction that what they are doing is good.  There is enough commonality for us all to work together, but that commonality is not as strong as one may think.

In a previous post on this blog I argued for a very different understanding of software freedom than Richard Stallman supposes, for example.  While he holds a liberal enumerated liberties view, I hold a traditionalist work-ownership view.  Naturally that leads to different things we look for in an ideal license.

And the diversity in viewpoint does not stop there.  Some come to open source because they believe that open source is a better way of writing software.  Some because they believe that open source software delivers benefits in use.  But regardless of our disagreement we share the understanding that open source software brings community and individual benefits.

In two ways then is open source software political:
  1. Communities require governance and this is inherently political, and
  2. To the extent there is a goal to transform the software industry to one of open source that is political.
The first as we will see is a major problem.  Open source communities are diverse in a way few Americans can fully comprehend (we like to think everyone is like us and there is one right way, the American Way whether that is in industry -- the right -- or formulations of rights -- the left).  Thus most discussions end up being Western-normative (and in particular American-normative) and disregard perspectives from places like India, Malaysia, Indonesia, and so forth.

However it is worth coming back to the point that what brings us together is an economic vision.  Yes, that is intrinsically and highly political, but it also has consequences for other causes and therefore it is worth being skeptical of alliances with groups in other directions.  What would an open source-based economy look like? What would the businesses look like? Will they be the corporations of today or the perpetual family businesses and trades of yesteryear?  And if the latter, what is the implication for the family?  Many of these questions (just like questions of same-sex marriage) depend in large part on the current social institutions in a culture -- the implications of an industrial, corporate, weak family society adopting something like same-sex marriage are very different than in an agrarian or family-business, strong family society.  My view is that these will likely have different answers in different places.

Thus when an open source community takes a position on, for example, gay rights in the name of providing a welcoming community, they make the community openly hostile to a very large portion of the world and I think that is not what we want.  Moreover such a decision is usually a product of white, Western privilege and effectively marginalize those in so-called developing countries who want to see their countries economically develop in a very different direction than the US has.  Worse, this is not an unintended side effect but the whole point.

A brief detour into the argument over white privilege


A discussion of so-called white privilege is needed I think for three groups reading this:
  • Non-Americans who will have trouble understanding the idea as it applies to American society (like all social ideas, it does not apply to all societies or even where it does apply, it may not in the same way).
  • White Americans who seem to have trouble understanding what people of color in the US mean when they use the term.
  • Activists who want to use the idea as a political weapon to enforce a sort of orthodoxy

I mentioned Western-Normative above.  It is worth pointing out that this forms a part of a larger set of structures that define what is normal or central in a culture, and what is abnormal, marginalized (or perhaps liminal).  It is further worth noting that the perception as to these models is more acute to those who are not treated as the paragons of normality.  In the US, the paragon of normality is the white, straight male.  But unspoken here is that it is the white, straight, urban, wealthy American male (or maybe European, they are white too).  Everyone else (women, people of color, Africans, Asians, etc) should strive to be like these paragons of success (I, myself, having lived most of my life now either outside the US or in the rural parts, am most certainly not included in this paragon of normality model, but nevertheless it took years of marriage to someone from a different culture and race to begin to be able to partially see a different perspective).

Now, it doesn't follow that white straight males live up to this image (which is one reason why white privilege theory has proven controversial among the arguably privileged) even where there is wealth, one is brought up in nice neighborhood in the city, etc.  But that isn't really the point.  The point is that society holds these things to be *normal* and everything else to be only normal to the extent it is like this model.  It would be better and more accurate to call this a model of normality rather than privilege and to state at the outset that we cannot really walk a mile in the shoes of people from across many social borders (culture included).

White privilege is real, as is male privilege (in some areas, particularly employment), urban privilege, American privilege, Western privilege, even female privilege (in some areas, particularly family law).

Issues exist in a sticky web of culture, and no culture is perfect


These issues of privilege aren't necessarily wrong in context:  it seems unlikely that the workplace can be made less male-normative without men sharing equally in the duties and rights of childrearing, but enforcing that cuts against the goal by some feminists of liberating women from men (and also exists in tension with things like same-sex marriage and gender-nonessentialism).  Insisting that men get the same amount of parental leave as women cuts one direction, but insisting that single women get free IVF cuts the other (both of these are either the case in Sweden or efforts are being made to make them the case).  In other words, addressing male privilege requires a transformation of the economic and family order together, in such a way that having children becomes an economic investment rather than an economic burden.  But then that has implications for the idea of gay rights as we understand the concept in the West because if having and raising children becomes normative then one is providing a sort of parental privilege, and gender equality becomes based on heteronormativity.

But the ultimate white privilege is to deny it is a factor when one uses one's own perception that other cultures are homophobic or transphobic to justify one's own racist paternalism.  No need to understand why.  We are white.  We know what is right.  We just need to educate them so they can join the ranks of the elite culturally white enlightened liberals as well.  Most of the world, however, disagrees, and as maintainers of open source projects we have to somehow keep the peace. (Note I use the term liberal as it is used in the history of ideas --  in the West it is no less prevalent on the mainstream right than on the mainstream left, though the application may be different.)

Since many of these issues necessarily exist in tension with eachother, there is no such thing as a perfect culture.  It isn't even clear the West does better than Southeast Asia on the whole (in fact I would say the SE Asia does better than the West on the whole).  But all culture is an effort at these tradeoffs, and it is not the job of open source communities to push Western changes on the rest of the world.

What is Social Justice?  Two Theories and a Problem


If open source is inherently political then social justice must in some way matter to open source.  Naturally we must understand what social justice is and how it applies.  Certainly a sense of being treated fairly by the community is essential for contributors from all walks of life.  The cult of meritocracy is an effort at social justice within the community.  As some argue it is not entirely without problems (see below) but as a technical community it is a start.

Western concepts of justice today tend to stress individuality, responsibility, and autonomy.  The idea is that justice is something that exists between individuals, and maybe between individuals and the state.  And while contemporary Western social justice theorists on the left try to relate the parts to the whole of society, it isn't clear that there is room for any parts other than the isolated individual and the state in their theories.  If one starts with the view that humans are born free but everywhere in chains (Rousseau), then the job of the state is to liberate people from eachother, and that leaves no room for any other parts.

The individualist view of justice, when seen as primary, breaks down in a number of important ways.  The most important is that it provides no real way of understanding parts and how they can be related to the whole.  Thus, the state becomes both stronger and isolates people more from eachother, and predictability becomes more important than human judgement.  Separatism cannot be tolerated, and assimilationism becomes the rallying cry when it comes to how the central model of normality should deal with those outside.  In other words the only way that this approach can deal with those on the margins is to destroy their culture and assimilate the individuals remaining.  Resistance must be made futile (Opalgate can be seen as such an effort).  For this reason, this view of justice is incompatible with real cultural pluralism.   This is not a question of the political spectrum in the US or Europe.  It is a fundamental cultural assumption in the much of the West.  Interestingly, the insistence that the personal is political means that intellectual feminism already exists in tension with this cold, mechanical view of justice.

Another view of justice can be found in Thomas Aquinas's view that in addition to justice between individuals, there is a need to recognize that just as individuals are parts in relation to the whole, so are other organs within society.  In other words, justice is a function of power, and justice is in part about just  design and proper distribution of power and responsibility.  In this regard, Aquinas built on the thought experiments of Plato's Republic and the Politics of Aristotle.  In this regard, key questions of social justice include the structure of an open source community and the relationship between the parts of the community (how users and developers interact and share power and responsibility), the relationship between open source projects and so forth.

In the end though, there was a reason why Socrates eventually rejected every formulation of justice he pondered.  Justice itself is complex and to formulate it removes a critical component of it, namely human judgement when weighing harms which are not directly comparable.  I think it is therefore quite necessary for people to remain humble about the topic and to realize that nobody sees all the pieces, and that we as humans learn more from disagreement than from agreement.  Therefore every one of us is ignorant to some extent on the nature of justice and so disagreements are healthy.

Open Source Projects and So-Called Social Justice Warriors


Coraline Ehmke, in her post on ticket on Opalgate asked:

Is this what the other maintainers want to be reflected in the project? Will any transgender developers feel comfortable contributing?"

This is  good question but another question needs to be asked as well.  Given that a lot of people live in societies with very different family and social structures, should people feel comfortable using software if the maintainers of the project have come out as openly hostile to the traditional family structures in a culture?  Does not a community that is welcoming of all need to avoid the impulse to delegitimize social institutions in other cultures, ones where one necessarily lacks an understanding into how it plays into questions of economic support and power?  If open source is already political do we want to ally ourselves with groups that could alienate important portions of our user base by insisting that they change their way of life?

It is important that we maintain a community that is welcoming to all, but that means we have to work with people we disagree with.  A mere difference of opinion should never be sufficient to trigger a problem with the code of conduct and expressing an opinion outside community resources should never be sufficient to consider the community unduly unwelcoming.  A key component of the community is whether people can work together with people when they disagree, and forcing agreement or even silencing opposition is the opposite of social justice when it comes to a large-reaching global project.

Should open source communities eject social justice warriors as ESR suggests?  Not if they are willing to work comfortably with people despite disagreements on hot button issues.  Should we welcome them?  If they are willing to work with people comfortably despite disagreements on hot button issues.  Should we require civility?  Yes.  Should we as communities take stances on hot button issues internationally?  Absolutely not.  What about as individuals?  Don't we have a civic duty to engage in our own communities as we feel best?  And if both those are true, must we not be tolerant of a wide range of differences in opinion, even those we find deeply and horribly wrong?

Wednesday, September 17, 2014

PGObject Cookbook Part 2.1: Serialization and Deserialization of Numeric Fields

Preface


This article demonstrates the simplest cases regarding autoserialization and deserialization to the database of objects in PGObject.   It also demonstrates a minimal subset of the problems that three valued logic introduces and the most general solutions to those problems.  The next article in this series will address more specific solutions and more complex scenarios.

The Problems


Often times we want to have database fields automatically turned into object types which are useful to an application.  The example here turns SQL numeric fields into Perl Math::Bigfloat objects. However the transformation isn't perfect and if not carefully done can be lossy.  Most applications types don't support database nulls properly and therefore a NULL making a round trip may end up with an unexpected value if we aren't careful.  Therefore we have to create our type in a way which can make round trips in a proper, lossless way.

NULLs introduce another subtle problem with such mappings, in that object methods are usually not prepared to handle them properly.  One solution here is to try to follow the basic functional programming approach and copy on write.  This prevents a lot of problems.  Most Math::BigFloat operations do not mutate the objects so we are relatively safe there, but we still have to be careful.

The simplest way to address this is to build into one's approach a basic sensitivity into three value logic.  However, this poses a number of problems, in that one can accidentally assign a value which can have other values which can impact things elsewhere.

A key principle on all our types is that they should handle a null round trip properly for the data type, i.e. a null from the db should be turned into a null on database insert.  We generally allow programmers to check the types for nulls, but don't explicitly handle them with three value logic in the application (that's the programmer's job).

The Example Module and Repository


This article follows the code of PGObject::Type::BigFloat..  The code is licensed under the two-clause BSD license as is the rest of the PGObject framework.  You can read the code to see the boilerplate.  I won't be including it in here.  I will though note that this extends the Math::BigFloat library which provides arbitrary precision arithmetic for PostgreSQL and is a good match for LedgerSMB's numeric types.

NULL handling


To solve the problem of null inputs we extend the hashref slightly with a key _pgobject_undef and allow this to be set or checked by applications with a function "is_undef."  This is fairly trivial:

sub is_undef {
    my ($self, $set) = @_;
    $self->{_pgobject_undef} = $set if defined $set;
    return $self->{_pgobject_undef};
}

How PGObject Serializes


When a stored procedure is called, the mapper class calls PGObject::call_procedure with an enumerated set of arguments.  A query is generated to call the procedure, and each argument is checked for a "to_db" method.  That method, if it exists, is called and the output used instead of the argument provided.  This allows an object to specify how it is serialized.

The to_db method may return either a literal value or a hashref with two keys, type and value.  If the latter, the value is used as the value literal and the type is the cast type (i.e. it generates ?::type for the placeholder and binds the value to it).  This hash approach is automatically used when bytea arguments are found.

The code used by PGObject::Type::BigFloat is simple:

sub to_db {
    my $self = shift @_;
    return undef if $self->is_undef;
    return $self->bstr;
}

Any type of course can specify a to_db method for serialization purposes.

How and When PGObject Deserializes


Unlike serialization, deserialization from the database can't happen automatically without the developer specifying which database types correspond to which application classes, because multiple types could serialize into the same application classes.  We might even want different portions of an application (for example in a database migration tool) to handle these differently.

For this reason, PGObject has what is called a "type registry" which specifies which types are deserialized and as what.  The type registry is optionally segmented into several "registries" but most uses will in fact simply use the default registry and assume the whole application wants to use the same mappings.  If a registry is not specified the default subregistry is used and that is consistent throughout the framework.

Registering a type is fairly straight forward but mostly amounts to boilerplate code in both the type handler and using scripts.  For this type handler:

sub register{
    my $self = shift @_;
    croak "Can't pass reference to register \n".
          "Hint: use the class instead of the object" if ref $self;
    my %args = @_;
    my $registry = $args{registry};
    $registry ||= 'default';
    my $types = $args{types};
    $types = ['float4', 'float8', 'numeric'] unless defined $types and @$types;
    for my $type (@$types){
        my $ret =
            PGObject->register_type(registry => $registry, pg_type => $type,
                                  perl_class => $self);
        return $ret unless $ret;
    }
    return 1;
}

Then we can just call this in another script as:

PGObject::Type::BigFloat->register;

Or we can specify a subset of types or different types, or the like.

The deserialization logic is handled by a method called 'from_db' which takes in the database literal and returns the blessed object.  In this case:

sub from_db {
    my ($self, $value) = @_;
    my $obj = "$self"->new($value);
    $obj->is_undef(1) if ! defined $value;
    return $obj;
}

This supports subclassing, which is in fact the major use case.

Use Cases


This module is used as the database interface for numeric types in the LedgerSMB 1.5 codebase.  We subclass this module and add support for localized input and output (with different decimal and thousands separators).  This gives us a data type which can present itself to the user as one format and to the database as another.  The module could be further subclassed to make nulls contageous (which in this module they are not) and the like.

Caveats


PGObject::Type::BigFloat does not currently handle making the null handling contageous and this module as such probably never will, as this is part of our philosophy of handing control to the programmer.  Those who do want contageous nulls can override additional methods from Math::BigFloat to provide such in subclasses.

A single null can go from the db into the application and return to the db and be serialized as a null, but a running total of nulls will be saved in the db as a 0.  To this point, that behavior is probably correct.  More specific handling of nulls in the application, however, is passed to the developer which can check the is_undef method.

Next In Series:  Advanced Serialization and Deserialization:  Dates, Times, and JSON