Drupal

Shai Sachs's picture

Tracking and passing ACES with Prosepoint

Note: Boston Green Drinks will be discussing the provisions of the Boxer/Kerry bill, and its impact on the environment, with Ben Wright, the Global Warming Advocate at Environment Massachusetts, on Nov. 3 at 7 pm.  I hope to see you there!

Last week, Senators Barbara Boxer and John Kerry introduced the Clean Energy Jobs and American Power Act in the US Senate.  This is the Senate's version of the American Clean Energy and Security Act (ACES), also known as the Waxman/Markey bill, which passed the House in June.  The legislation would, among other things, set a nationwide renewable energy portfolio and create a cap-and-trade system to regulate greenhouse gas emissions.

Although I write a lot about ideas for renewable energy and energy efficiency which can be pursued by individuals, organizations, and businesses with minimal governmental support, the fact is that major governmental action is needed to stop catastrophic climate change.  The problem is simply too huge to leave to the private sector.  Moreover, current government policies encourage the unsustainable practices, including dirty fuel energy production, excessive driving at low mileage, and deforestation, which are the main factors in catastrophic climate change; the government should reverse this destructive path.  With regards to Congressional action specifically, climate change will have massive impacts on interstate commerce and national security, and it is therefore Congress's constitutional responsibility to take action on this issue.

The immediate task before anyone who wants to avoid catastrophic climate change is to push the Senate to pass the Boxer/Kerry bill, and to push the Congress to merge the two bills and to pass the final legislation.  Ideally, the final passage would be accomplished before the UN Climate Change Conference in Copenhagen, set for Dec. 10.

If you've paid any attention to the debate on health care reform, you're no doubt aware that passing major legislation in Congress, and especially in the Senate, is not a piece of cake.  There are dozens if not hundreds of hurdles, amendments, votes, procedural questions, and sundry sub-plots to track.  There are massive, and in some cases very wealthy, interests who want to have a say.  There are lines in the sand, and it's often quite difficult to tell when they have been crossed, who is acting in good faith, what's just a feint, and what is going on behind the scenes.

With regards to the Boxer/Kerry bill, it appears that there will be at least a few major issues with the bill:

  • Will the bill, in its final form, allow the EPA to continue regulating greenhouse gases?
  • Will the bill encourage green collar job creation, thereby spreading renewable energy prosperity broadly?
  • What numerical targets will be set for renewable energy creation and emission reductions, and how aggressive will the timetables be?
  • Will the bill rely on carbon capture, or support natural gas as an intermediary solution, in emission reductions?
  • When will the bill pass, and how will the Senate bill be merged with the House bill?

Clearly, the debate on this bill will be just as complicated, if not more so, than the debate on health care reform, and a great deal of work needs to be done to track the progress on these issues, and to hold elected officials accountable for their actions.

Luckily, there are plentiful tools available to organize this massive and complex stream of information, and there is already a decent-sized body of practice in tracking legislation and holding leaders accountable, particularly within the progressive blogosphere and other social media forums.  But it seems to me that blogs, wikis, social media streams, and Twitter feeds, for all their power, are not really sophisticated enough to capture the complexity of the climate change debate.  All of these sources have the main weakness that they are mainly suited to highlighting one thing at a time, whereas the climate change debate requires us to keep track of multiple things at once - the five questions I listed above, and perhaps a few others.  It would be nice to have a site where a visitor could get a quick glimpse of the status and recmmended action for each of these issues.

The best infrastructure for creating such a complex site, in my opinion, is Drupal, the content management system used in almost every Lightbulb First project.  As it happens, there is an excellent distribution of Drupal targeted specifically at supporting online newspaper sites, called ProsePoint.

I'd like to see someone create and maintain a ProsePoint site whose main focus is to track stories on the Boxer/Kerry bill, and to organize civic action to push legislators to vote the right way on the bill.  Now, Prosepoint is mainly intended as a tool to manage news sites, not to oragnize civic action, but it's not hard to imagine extending the basic software a bit to incorporate activism.  One approach would be to add a "Suggested action" field to the Store content type, and then to extend the theme to highlight actions alongside the story body in some way.  Another approach would be to create a separate "Action item" content type, and to allow those content items to be attached to stories.  This latter approach might work better, as it would enable the creation of different types of actions - e.g. signing a petition, calling Senators, donating to an activist group or to a Senator who helped push the legislation along, etc.  It would also allow the site editors to highlight important actions over the course of several stories.

What I like about ProsePoint is its native support for multiple channels, which is vitally important for the climate change debate.  Channels are found on almost every news site these days; the channels at the top of the ProsePoint demo site include "National", "World", and "Business".  To track the questions I listed above for the Boxer/Kerry bill, it would be easy enough to create channels with titles like "EPA authority", "Green collar", "Renewable energy standard", "Carbon capture", and "Legislative calendar" (although I'm sure that someone with more experience managing online news sites could come up with something a bit snappier.)

Moreover, the software includes personalization capabilities, which appear to be focused around allowing visitors to easily access the stories most interesting to them through a "My Magazine" page.  Unfortunately the documentation for these features is not too clear, but these features could be a good foundation for giving visitors the tools to focus on the stories and actions most interesting to them, and also to recruit their friends and personal networks to support the elements of the bill they are most passionate about.

The biggest challenge in running such a site would be time and money.  If (and this is a rather big "if") the bill is indeed passed by Dec. 10, then the time frame for launching, stocking, and marketing a major new site focused solely on the Boxer/Kerry bill is indeed quite short.  And don't forget that we need to account for the time to incorporate or find an existing organizational home, untangle legal issues, put together a decent design, etc.

On top of that, such a site would require at minimum a part-time contributor, and probably more like one or two full-time staff.  Even for a couple of months, at a decent salary for each staff person, the costs for the site could run into the $30-40,000 range.  Unless the site manages an exceptional amount of traffic, online advertising certainly won't cover those costs.  Instead, I would suggest corporate or organizational sponsorship.  There are certainly no shortage of public interest groups for whom passage of the bill would be priceless, and there are now several major corporations lobbying for aggressive passage of the bill.  Beyond these big-name brands, there are probably dozens if not hundreds of nascent renewable energy and energy efficiency companies for whom the bill is a major boon, and who should be willing to kick in a little money to promote passage of the bill.

Even if it's not feasible to launch such a site in time to catch the major part of the debate on the Boxer/Kerry bill, it's still worth thinking about the idea of for-profit news/action sites that rely on sponsorships.  The Boxer/Kerry bill will not be the last piece of legislation dealing with catastrophic climate change.  At a minimum, the Senate will have to ratify the Copenhagen treaty once it's negotiated, and Congress will probably revisit the issue at some point in the next few years.  Moreover, states and cities are sure to continue to take action on any number of green issues, including green job initiatives, building codes, transportation policies and food system reforms.  Preparing for these debates now will only strengthen the hand of ordinary folks who want to protect the environment, and will help create the structural reform necessary to avoid catastrophic climate change.

Shai Sachs's picture

Lightbulb First releases three new Drupal modules

Here at Lightbulb First, we've been busily working away on the forthcoming release of Welcoming Websites Wizard, as well as a few other exciting projects.

There are plenty of important requirements to account for, and in some cases, we're learning that the things we need to do aren't yet available in Drupal 6. So we're pushing ahead and contributing our work back to the community.

As a result, we've released a few new Drupal modules recently:

  • Auto Promote, a module which allows some users to write content which is automatically promoted to a site's home page.
  • Date Repeat Node Generator, a module which allows site administrators to use the excellent Date module's repeating events API to create many events at one time.
  • EZ Menu, which makes it a snap to add new pages and content listings to a site menu.

We're excited about these contributions, and we hope they're helpful to others in the Drupal community!

Shai Sachs's picture

Drupalcon Day 1: Knight Foundation

Drupalcon 2008 kicked off this morning in the Boston Convention and Exhibition Center. Shelby and I have been busily running around, meeting with friends we've made throughout the Drupal world and learning a lot of interesting things.

Today I'm sitting in the Knight Foundation panel (and, of course, I can't get the wireless network for some reason, so this will be a bit of a link-free post). I've been fascinated by Knight for a while now, because it is a great example of a non-profit organization seeding new experiments in the world of for-profit journalism.

Lisa Williams just finished speaking about her experience with H20Town, a wonderful Watertown, MA-based placeblog, and subsequently with Placeblogger.com, an aggregator for other placeblogs throughout the world. Her larger point, though, was that increasingly the problems of the world are those of aggregation and generalization - figuring out that a localized problem is in fact a more systemic and generalized. She said that that's a technology problem, not a journalism or politics problem.

Ben Melancon is now speaking; he's the developer of the related content module, which allows users to find internal connections between different URLs on any site. This module should work well in combination with Pivots. The angle on journalism is that he think that related content and pivots will allow people to draw more connections between editorial and news sections. (I'm personally a bit skeptical that newspaper sites will willingly deploy that kind of software, however.)

There's some discussion also of how the Knight Foundation's support of the Summer of Code will happen, and what kind of code will emerge. That's a different beast from the foundation's grant challenge, as it involves an organized program to pay college students to contribute code to Drupal, but it's certainly an interesting project.

On the whole, it sounds like Knight really wants journalism innovators to get together on the Drupal Knight group, put together a grant to embody their idea, and actually make it happen. Sounds pretty tempting! This kind of pro-innovation approach is what we want to emphasize at Lightbulb First, and I applaud Knight's leadership in this area.

Shai Sachs's picture

See you at Drupalcon!

Next week, Shelby and I will be attending Drupalcon, the foremost international conference for Drupal users, developers, designers, and consumers. The program looks fascinating, and we can't wait to see what's brewing in the Drupal world. I'm also looking forward eagerly to the Drupal Code Sprint, taking place at that architectural marvel known as the MIT Stata Center.

If you'll be in Boston for Drupalcon, contact us, and we'll plan to get together. It should be a great time!

Shai Sachs's picture

Upgrading from Drupal 4.7 to 5.x, and from Flexinode to CCK, for the First Parish in Cambridge

Over the past few months, it's become clear that the website for the First Parish in Cambridge, Unitarian Universalist, badly needs a couple of upgrades.  One is an upgrade from Drupal 4.7 to Drupal 5.x.  Another is a modification of the way content types are implemented on the site, from Flexinode to a combination of the Content Construction Kit (CCK) and Views.

I've worked on this project on a pro-bono basis for Welcoming Websites.  It's been a long, somewhat difficult, journey, but now that it's almost complete, I thought I'd share my thoughts and tips on how best to proceed.  I imagine that many other folks whose Drupal sites are aging might be going through the same thing, and I hope these tips prove useful to them.

The first thing I did was create a separate copy of the website.  This is a basic but quite important step, because the upgrade is a fairly length process, and it's unlikely that you'll be able to complete it in a time span that's so short that you can reasonably expect to keep your site down for its entire duration.  What this means is that you'll almost certainly need to migrate some content from your live site to the new Drupal site before re-launching with Drupal 5.  That's a shame, but it also gives you additional motivation to finish the project as quickly as possible.

Once the new site has been created, there's a basic question to answer: should you upgrade to Drupal 5 first and then upgrade to CCK, or upgrade to CCK first and then upgrade to Drupal 5?  I recommend upgrading to Drupal 5 first, and leaving your content types alone.  The reason is that Drupal 5's CCK module, and some of the other modules you might be using like Event, are likely to be considerably more robust in Drupal 5 than they are in Drupal 4.7.  In our case in particular, we tried upgrading from Flexinode to CCK first, and discovered trouble when we couldn't apply Event-related settings to CCK content types in Drupal 4.7 (although that is possible in Drupal 5.)

Upgrading to Drupal 5 is a bit of a beastly process, but thankfully a fairly clean one.  I found the instructions in UPGRADE.txt perfectly suitable and, thanks to the fact that we didn't patch the core engine or modules too much, and didn't have a whole lot of modules in the old 4.7 site, relatively painless.

I wish I could say the same for the Flexinode to CCK upgrade, but I can't.  Naturally I began by downloading the excellent Flexiconvert module, but that module only takes you so far.  There were  a few other steps I had to take once the module was installed.

  1. Dependent modules.  I needed to install Email, Date, and Filefield modules in order to make the upgrade successful.
  2. Creating new content types.  I had to create new CCK-based content types to mirror the old Flexinode-based types.  This was a bit awkward, since it meant I would temporarily have two content types with the same name in the system.  I chose to name the types with "CCK" at the end, and to use a "_cck" suffix in the type handle, in order to prevent any accidental confusion during the conversion process.  There was also the minor pain in the neck of setting up each field anew.  In the future, a module like CoCKTaiL might help minimize this trouble, but it wasn't a big deal.
  3. Creating date fields.  In some cases, date fields in the old site had not been properly created in the first place, so I had to create new ones and port data into them, before running the conversion.
  4. Removing body fields and creating them anew.  Flexinode doesn't automatically include a body field with each content type, but CCK does.  Consequently, all of our old content types had body-like fields manually created, and those fields had to be mapped to a non-body CCK field.  That meant that I had to create CCK text fields rather than just use the built-in CCK body field.  That's a little annoying at first blush, but on deeper reflection, it's not so bad, since I was able to have a bit more user-friendly control over the display and weight of body fields in nodes and teasers than I would have otherwise had.

The next thing to do was run the actual conversion.  Happily, this went smoothly in most cases, except for a few content items where I got SQL errors of the following sort:

user warning: Unknown column 'field_description_value' in 'field list' query: UPDATE node_content_group_cck SET field_description_value = 'asdf' WHERE nid = 297 in /home/lightbul/public_html/clients/fpcamb_47/includes/database.mysql.inc on line 120.

The problem was usually caused by Flexiconvert's assumption that every CCK textfield has a _value column (this assumption is incorrect, I believe, for Plain text fields, and correct for Filtered text fields.)  Fortunately, the problem is easily solved by re-running the errant query and eliminating the "_value" suffix.

Once Flexiconvert is done doing its magic, the next step is to "de-Flexinode" your site.  De-Flexinoding can mean a bunch of different things:

  1. Removing all Flexinode content types.  Fortunately, it doesn't mean removing all old Flexinode content; Flexiconvert does that for you.
  2. Replacing all your old flexinode/table and flexinode/list pages with the equivalent Views tables.  I'd recommend only doing this for the Flexinode pages which are actively linked somewhere in your menu system, url aliases, and body text.  You can, happily, discover these URLs with the following SQL commands:

    SELECT * FROM menu WHERE path like '%flexinode%';
    SELECT * FROM url_alias WHERE src like '%flexinode%';
    SELECT * FROM node_revisions WHERE body like '%flexinode%';

    Please note that you might also need to run queries on your CCK text and/or link fields to make sure you don't have stray flexinode links in there.  Once you've identified your active flexinode URLs, create one View for each, and replace the URL in the menu item, URL alias, or node body as appropriate.  Building the new views can be a tedious and painfully slow process, but look at it this way - at least you have much more control on your multiple-node pages now!

  3. Re-establishing connections between vocabularies and your CCK types.  Unfortunately, Flexiconvert doesn't do it for you.
  4. Poke around your "display fields" settings in your CCK types to make sure that field label, teaser, and full-body settings are set appropriately.  If you've been fine-tuning your theme with node-flexinode-xyz.tpl.php files, you'll need to convert those too.

At this point, it's wise to take a look at your whole site and make sure that everything is looking right.  Of course, train a keen eye on the new views that you've created, and the teaser and full-node displays of at least one content item per new content type.  Check out the node-creation form for each of your new content types, and make sure that it's reasonably easy for your site admins to understand.

And that should be it!  At this point, you should migrate all the new or updated content from your old 4.7 site to your new 5.x site.  Backup and archive the old 4.7 site, swap in the files and database for the new 5.x site, and you should be done.

The site upgrade is sort of a thankless process since, if you do it right, your end users and site admins will have no idea that you did it at all.  That's the bad news.  The good news is that you're now ready to take advantage of all the advantages that CCK/Views overs on top of Flexinode.  You'll also be able to start playing around with the shiny new Drupal 5 modules, many of which are much more feature-rich and robust than their 4.7 cousins.

In our case, there's a lot we want to do once the site upgrade is done (ETA: this coming Saturday.)  We'll start by adding podcasting on top of the sermon archive, using a rich archive of MP3s several stalwart church members have built up.  We will certainly be looking at Panels as a more robust way of creating two-column pages.  We'll also want to use Organic Groups to provide mini-home pages to each committee and group in the congregation.  We might try to move the front-page Flash-based photo slideshow into a new Javascript-based, CCK/imagefield/jCarousel-powered slideshow.  And in the long term, we will take a look at rental payments and perhaps even store functionality using the eCommerce module.

As you can see, the Drupal 4.7/5.x and Flexinode/CCK upgrade are not particularly pleasant.  But it's certainly well worth it!

Shai Sachs's picture

Lightbulb First to attend Boston Drupal Conference

Today Shelby and I finalized our registrations for Drupalcon 2008, which will take place at the Boston Convention Center. The program looks stellar, and we're very excited to meet fellow Drupalists from across the world.

Since Lightbulb First was established in March 2007, we've been gradually stepping up our involvement in the Drupal community. We've contributed patches to core Drupal modules, posted to our blog about how we solve problems in Drupal, and have begun attending the Boston Drupal meetings. This conference should be an excellent continuation of that involvement.

We're lucky that the conference is being held so close to home - we're based in Cambridge, MA, a stone's throw away from the convention center. More than that, we certainly appreciate the auspicious timing - the conference wraps up on our anniversary (March 7). This will be a great conference! Stay tuned for more thoughts and reports from the conference later this year.

Notes on Greater Boston Drupal Meetup

Earlier this week, Shai and I checked out the Boston Drupal Meetup, at the Berkman Center for Internet and Society at Harvard Law School. Drupal enthusiasts gathered to give short talks about how they are working on and using Drupal.

We heard presentations from Andy Chase, who is working on the re-release of the User Autorole module, and from Moshe Weitzman, who is creating the new Devel_Themes module.

The Devel_Theme module will be available for Drupal 6, and will make it easier for developers to identify the functions, variables and template files impacting the look of a given page. In addition, the module will include a "duration" measurement that tells how much processing time was taken up by each component.

Local developers also gave talks, explaining how they use Drupal to create sites for educational clients, religious organizations, and a well-known entertainment celebrity.

We had a great time. It was exciting to see what other people are doing with Drupal in our area, and to learn about some of the new features available in the next release of Drupal. We certainly enjoy being part of this vibrant and creative community!

Shai Sachs's picture

Deleting a lot of nodes at a time in Drupal for First Parish in Cambridge

Update, 19 Dec 2007: Today I discovered an exciting new Drupal module, Views bulk operations, which should make this task much easier and less error-prone. The module is still in beta, but it looks quite promising! Thanks very much to the folks at Dev Bee for bringing this module to my attention.

If you've spent any appreciable amount of time maintaining Drupal sites, you've probably had an unfortunate snafu that required some behind-the-scenes plumbing work that can't be done using the traditional Drupal tools. You've probably been extremely frustrated with the paucity of tools available for doing things like deleting a lot of nodes at once to clean up the mess left by an unruly contributed module, and hoping for some kind of quick fix. That was the situation I found myself in today, and while I have some thoughts about deleting a lot of nodes in Drupal, I unfortunately don't have a quick fix.

One of my volunteer projects is helping out with the website for the First Parish in Cambridge, Unitarian Universalist. That's my wife's church, and I drop by there myself every few months for special occasions.

Like many churches, First Parish Cambridge has certain events which occur periodically - weekly services, routine group meetings, choir rehearsals, etc. When I set up the site originally together with the rest of the volunteer team, we figured that the event repeat module would do the trick. Well, it did - far, far too well. For some reason, we recently discovered that the event repeat module had created something like 64,000 events on the church calendar, about 55,000 of which were in June 2008. Talk about an active congregation!

We haven't yet pinned down the exact cause of the snafu, but it looks like it has something to do with the fact that many of the periodic events were set up with a June 2008 end date. Presumably, there's some bug in event repeat which loops for a while on the end date under the wrong circumstances.

At any rate, tonight my task was to clear out the thousands and thousans of extra events that have piled up in the system, in order to make upgrading to Drupal 5.3 a little less painful. In technical terms, I decided to delete every event whose start date was June 2008 or later (there were several thousands of events stretching all the way to 2012, in addition to the 55,000 in June 2008.)

The first thing I did was backup the database and folder structure. That's always important when undertaking these kinds of mammoth database reorganizations.

My first approach was straightforward enough. I created a bit of code which queried the database for events with start dates past June 2008, and deleted them using Drupal's node_delete function. Then I slipped that code into a simple page, set the filter to PHP, and let it run. (By the way - may I caution anyone whose reading this to take this step only with the utmost caution. Leaving a bit of code in your Drupal system like this untended is extraordinarily dangerous. Delete the page as soon as you're done with it!) Here was the code for the page:

set_time_limit(0);

$res = db_query("select n.nid from node n inner join event e on n.nid = e.nid
where from_unixtime(e.event_start) >= '2008-06-01' limit 1000");

$output = '';
while ($arr = db_fetch_array($res)) {
  $nid = (int) $arr['nid'];
  if ($nid <= 0) { continue; }
  node_delete($nid);
  $output .= "deleted node $nid
\n"; } print $output;

I added the limit clause in the SQL in order to avoid potentially choking my database server with what I imagined to be a database-intensive node_delete call. Unfortunately, the limit was too generous by an order of 5 or so. The database choked after about 200-300 nodes. With 64,000 nodes to delete, I'd have to run this same script 320 times to get everything cleared out of the system! That wouldn't be a lot of fun, and it'd almost certainly incur the wrath of my hosting provider.

My next approach was to see if I could delete a lot of nodes at once. After all, a statement like:

DELETE FROM node WHERE nid >= 10000 AND nid <= 30000

is much faster than 20,000 iterations of a statement like:

DELETE FROM node WHERE nid = 10000

Unfortunately, Drupal just doesn't give you an option to effectively execute the first kind of statement; there's no "mass delete" operation in the node API.

To recreate my own one-off "mass delete" operation, I did the following.

  1. Determined the set of function calls induced by a single node_delete. Like many node operations, node_delete consists of a couple of simple SQL statements, adorned with calls to node_invoke and node_invoke_nodeapi. These two functions are themselves pretty simple: they find every module which contains a function whose name indicates that it should be invoked, and then they call those functions. In the case of node_invoke, deducing exactly which function names will be used is a little tricky. I cheated by inserting "print" statements before each function invocation and looking at the trace that resulted when I created a page whose body was
    node_delete($some_nid);

    Of course, $some_nid was chosen to be the nid for a node which I had already identified as a node I wanted to delete.

  2. Converted these function calls into SQL DELETE statements. Once I knew the names of functions invoked during a node_delete, I traced through each function, looking for SQL delete statements, as well as any Drupal functions which might indicate that some other change needed to be made to the Drupal system. Fortunately, the events I was working with didn't have attached files, and didn't have other kinds of complicating factors that might have required the execution of PHP code on a per-node basis. As a result, I was able to compile a list of simple DELETE statements which, taken together, would recreate the deletion of a single node; there was no need for extra computation (though circumstances may vary in your own Drupal installation.) By the way, note that I could have extracted the exact same SQL statements more simply with the Devel module. However, I simply didn't have the time to install it.
  3. Identified large ranges of nodes to delete This is a critical piece, since my MySQL database doesn't support sub-selects, and in any event sub-selects are horribly inefficient. I wanted to write SQL queries like
    DELETE FROM node WHERE nid >= 10000 and nid <= 30000

    in order to delete a lot of nodes quickly. There was a little black magic here, as I had to essentially guess what the data looks like, but the nature of the problem (event repeat iterating thousands of times upon a single node creation) gave me a hint that there would be long sequences of nodes which all were events past June 2008. What I ended up doing was grouping the list of nodes by the thousands of node ids, and within each group of a thousand, by node type. The SQL query I wrote was:

    SELECT floor(nid/1000), type count(*) FROM node
    GROUP BY floor(nid/1000), type

    That helped me identify large ranges of nodes which were almost entirely made of events - ranges as large as 10,000 nodes at a stretch.

  4. Verified that these ranges contained only offending nodes (i.e., nodes with start date >= 2008-06-01) When deleting large amounts of data, this kind of verification is absolutely critical! SQL makes deleting a lot of data disastrously easy. Even though I had everything backed up, I didn't want to get halfway through my deletions only to find that I had made some crucial mistake early on. My solution was to double-verify. First, I verified that there were no nodes which were events, and no events with start dates before June 1, 2008, in my range. Then, I verified that all of the nodes in the group were events with start dates after June 1, 2008. These two tests are logically equivalent, and if I perfectly reproduced the tests each time, I should always get the same answer. However, I was worried about typing mistakes and forgetful recreations of templated SQL queries; double-verification guaranteed that I'd have to make the same mistake twice before deleting a lot of good data.

    Each test was actually a pair of SQL statements which were executed simultaneously; both statements were expected to give the same result. If they didn't, something was wrong.

    This was the first test; it was expected to return 0 for both queries:

    select count(*) from event
    where from_unixtime(event_start) < '2008-06-01'
    and nid >= 10000 and nid <= 30000;
    
    select count(*) from node n
    where type != 'flexinode-10'
    and n.nid >= 10000 and n.nid <= 10000;

    This was the second test; it was expected to return a number roughly (but not necessarily exactly) equivalent to the size of the range, and both queries were expected to return the same number:

    select count(*) from node n inner join event e on n.nid = e.nid
    where n.type='flexinode-10' and from_unixtime(e.event_start) >= '2008-06-01'
    and n.nid >= 10000 and n.nid <= 30000;
    
    select count(*) from node n
    where n.nid >= 10000 and n.nid <= 30000;
  5. Executed the SQL DELETE statements against these ranges This was the easy (and fun) part. Once I determined that a range of node ids like, say, 10,000 - 30,000 was safe for deletion, I executed the SQL statements I identified in step 2 against that range. I just replaced queries like
    DELETE FROM node WHERE nid=123

    with

    DELETE FROM node WHERE nid >= 10000 and nid <= 30000

    It's very satisfying to delete 20,0000 rows with a single command, somehow
    .

  6. Sanity-checked the new database Before overwriting the live site database with the database I had hacked apart offline, I wanted to make sure that I hadn't accidentally deleted any good data. I took a look at the number of events for each month before June 2008, for both the live and development databases. (Incidentally, Excel is extremely useful for this! Just copy the two SQL result sets side-by-side into a spreadsheet, use Tools -> Text to columns align the MySQL data into cells, and do row-by-row comparisons with Fill down.) Everything turned out the same, so I went ahead and transfered the database back to the live site.

This database intervention was actually just the first step in upgrading the site. I wanted to get the size of the database reduced before running a Drupal update script or a Flexinode-to-CCK converter (yep, this old site was built without CCK. Whoops.) I succeeded, incidentally - the SQL transcript for the old database was 140 MB, and the transcript for the new database was "only" 30 MB. Mission accomplished!

Syndicate content