Fold-up: a source transformation program would be useful for debugging and
understanding programs
Consider this long, complicated function I took from PostgreSQL.
/* from includes */
#define LP_USED 0x01 /* this line pointer is being used */
#define LP_DELETE 0x02 /* item is to be deleted */
#define OverwritePageMode 0x10
/* from .c */
OffsetNumber
PageAddItem(Page page,
Item item,
Size size,
OffsetNumber offsetNumber,
ItemIdFlags flags)
{
PageHeader phdr = (PageHeader) page;
Size alignedSize;
int lower;
int upper;
ItemId itemId;
OffsetNumber limit;
bool needshuffle = false;
bool overwritemode = (flags & OverwritePageMode) != 0;
flags &= ~OverwritePageMode;
/*
* Be wary about corrupted page pointers
*/
if (phdr->pd_lower < SizeOfPageHeaderData ||
phdr->pd_lower > phdr->pd_upper ||
phdr->pd_upper > phdr->pd_special ||
phdr->pd_special > BLCKSZ)
elog(PANIC, "PageAddItem: corrupted page pointers: lower = %u, upper = %u, special = %u",
phdr->pd_lower, phdr->pd_upper, phdr->pd_special);
/*
* Select offsetNumber to place the new item at
*/
limit = OffsetNumberNext(PageGetMaxOffsetNumber(page));
/* was offsetNumber passed in? */
if (OffsetNumberIsValid(offsetNumber))
{
/* yes, check it */
if (overwritemode)
{
if (offsetNumber < limit)
{
itemId = PageGetItemId(phdr, offsetNumber);
if (((*itemId).lp_flags & LP_USED) ||
((*itemId).lp_len != 0))
{
elog(WARNING, "PageAddItem: tried overwrite of used ItemId");
return InvalidOffsetNumber;
}
}
}
else
{
if (offsetNumber < limit)
needshuffle = true; /* need to move existing linp's */
}
}
else
{
/* offsetNumber was not passed in, so find a free slot */
/* look for "recyclable" (unused & deallocated) ItemId */
for (offsetNumber = 1; offsetNumber < limit; offsetNumber++)
{
itemId = PageGetItemId(phdr, offsetNumber);
if ((((*itemId).lp_flags & LP_USED) == 0) &&
((*itemId).lp_len == 0))
break;
}
/* if no free slot, we'll put it at limit (1st open slot) */
}
if (offsetNumber > limit)
{
elog(WARNING, "PageAddItem: specified offset after maxoff");
return InvalidOffsetNumber;
}
/*
* Compute new lower and upper pointers for page, see if it'll fit.
*
* Note: do arithmetic as signed ints, to avoid mistakes if, say,
* alignedSize > pd_upper.
*/
if (offsetNumber == limit || needshuffle)
lower = phdr->pd_lower + sizeof(ItemIdData);
else
lower = phdr->pd_lower;
alignedSize = MAXALIGN(size);
upper = (int) phdr->pd_upper - (int) alignedSize;
if (lower > upper)
return InvalidOffsetNumber;
/*
* OK to insert the item. First, shuffle the existing pointers if
* needed.
*/
itemId = PageGetItemId(phdr, offsetNumber);
if (needshuffle)
memmove(itemId + 1, itemId,
(limit - offsetNumber) * sizeof(ItemIdData));
/* set the item pointer */
(*itemId).lp_off = upper;
(*itemId).lp_len = size;
(*itemId).lp_flags = flags;
/* copy the item's data onto the page */
memcpy((char *) page + upper, item, size);
/* adjust page header */
phdr->pd_lower = (LocationIndex) lower;
phdr->pd_upper = (LocationIndex) upper;
return offsetNumber;
}
( Portions Copyright (c) 1996-2002, PostgreSQL Global Development Group;
Portions Copyright (c) 1994, Regents of the University of California )
Now, this function can be run in a few different ways (eg: with or without
"overwrite mode" - whatever that is). When a programmer is confronted with
a function like this, she probably tries to understand it by looking at the
main cases. I propose a program, foldup to help by allowing the programmer
to specify a case she is interested in, and removing code that is not
relevant to that case. In the above example, the foldup might be invoked
like this:
foldup bufpage.c --assert '(flags & OverwritePageMode) == 0' \
--assert 'PageAddItem() != InvalidOffsetNumber'
This would "simplify" the source code as much as possible by applying to
assumptions: that overwrite mode isn't being used, and calls to the function
succeed (i.e. don't fail with error code "InvalidOffsetNumber").
And the output might appear like this:
OffsetNumber
PageAddItem(Page page,
Item item,
Size size,
OffsetNumber offsetNumber,
ItemIdFlags flags)
{
PageHeader phdr = (PageHeader) page;
Size alignedSize;
int lower;
int upper;
ItemId itemId;
OffsetNumber limit;
bool needshuffle = false;
/*
* Be wary about corrupted page pointers
*/
if (phdr->pd_lower < SizeOfPageHeaderData ||
phdr->pd_lower > phdr->pd_upper ||
phdr->pd_upper > phdr->pd_special ||
phdr->pd_special > BLCKSZ)
elog(PANIC, "PageAddItem: corrupted page pointers: lower = %u, upper = %u, special = %u",
phdr->pd_lower, phdr->pd_upper, phdr->pd_special);
/*
* Select offsetNumber to place the new item at
*/
limit = OffsetNumberNext(PageGetMaxOffsetNumber(page));
/* was offsetNumber passed in? */
if (OffsetNumberIsValid(offsetNumber))
{
if (offsetNumber < limit)
needshuffle = true; /* need to move existing linp's */
}
else
{
/* offsetNumber was not passed in, so find a free slot */
/* look for "recyclable" (unused & deallocated) ItemId */
for (offsetNumber = 1; offsetNumber < limit; offsetNumber++)
{
itemId = PageGetItemId(phdr, offsetNumber);
if ((((*itemId).lp_flags & LP_USED) == 0) &&
((*itemId).lp_len == 0))
break;
}
/* if no free slot, we'll put it at limit (1st open slot) */
}
/*
* Compute new lower and upper pointers for page, see if it'll fit.
*
* Note: do arithmetic as signed ints, to avoid mistakes if, say,
* alignedSize > pd_upper.
*/
if (offsetNumber == limit || needshuffle)
lower = phdr->pd_lower + sizeof(ItemIdData);
else
lower = phdr->pd_lower;
alignedSize = MAXALIGN(size);
upper = (int) phdr->pd_upper - (int) alignedSize;
/*
* OK to insert the item. First, shuffle the existing pointers if
* needed.
*/
itemId = PageGetItemId(phdr, offsetNumber);
if (needshuffle)
memmove(itemId + 1, itemId,
(limit - offsetNumber) * sizeof(ItemIdData));
/* set the item pointer */
(*itemId).lp_off = upper;
(*itemId).lp_len = size;
(*itemId).lp_flags = flags;
/* copy the item's data onto the page */
memcpy((char *) page + upper, item, size);
/* adjust page header */
phdr->pd_lower = (LocationIndex) lower;
phdr->pd_upper = (LocationIndex) upper;
return offsetNumber;
}
I reckon this later version is a bit easier to understand now. The next
assertion I'd apply might be "offsetNumber == 10".
Some thoughts on this idea:
- Error handling and assertions often obfuscates the main purpose of the code.
foldup should be good for helping programmers remove these (mostly)
irrelevant details.
- Integrating into IDEs/vim/etc: perhaps a "foldup" window listing
assertions?
- Implementation: "simplifying" (aka optimizing) is a well researched area
of computer science. This is mostly constant folding which is relatively
easy to implement. However, we would like to preserve comment structure, etc.,
which would make implementation somewhat challenging.
- How should comments be handled? If they appear inside the body
of an unreachable if statement, they should probably be removed entirely.
What about the line before or after a removed comment? Hard to say!
- Reverse transformations: perhaps it would be helpful to edit
a program in "simplified mode", and transform it back afterwards?
Is this a good idea? (or might it lead to fragile code, overlooking
important cases?)
- Useful for debugging? Easing code inspection of special cases might
be useful.
- Is this solving the wrong problem? Should code ever get to a state
where foldup is useful?
It appears that this is equivalent to
The Formal Transformation Approach to Source Code Analysis and Manipulation,
by M. P. Ward. The idea is a generalization of slicing, which is discussed
towards the end of the paper.
Back to my ideas page