How to maintain a popular Python library for most of your life without with burning out

Illustration 7 from Alice In
                               Wonderland, "Alice in pool of tears"

A talk by Leonard Richardson

Delivered May 17, 2024 at the Pycon Maintainers Summit


Hi, my name is Leonard Richardson. I'm a science fiction writer, but I'm here today as the creator of the screen-scraping library Beautiful Soup, the 69th most downloaded module on PyPI.

A picture of me in 2004, a few weeks before
  inventing Beautiful Soup, next to a picture of me in 2023, much older.

I've been more or less the sole developer and maintainer since I created the project 20 years ago. When I was young, I was happy to write open source software all day, every day, with or without compensation. But over the years, my interest in volunteer open source work has declined until it's about the size of this one project.

A graph comparing my subjective enthusiasm for volunteer open source work (gradually declining with periods of burnout dropping it to zero) against a hypothesized, more gradual decline if there were no burnout.

I am very happy whenever people use Beautiful Soup to make their lives better, but I've learned I can't let my life revolve around it. Over the years, I’ve taken steps to keep Beautiful Soup a reasonably scoped project, a project that one person can maintain no matter how much use it sees. And even so, there have been two long periods of burnout in my life, when I had no interest in writing software at all.

This is a talk about how I've kept going on this track for twenty years, with special focus on the periods of burnout. But I want to stress that even if there were no burnout, my interest in doing volunteer open source work has naturally declined as I aged, got married, and picked up other interests. Even outside these really bad periods, I continually have to decide what level of commitment I can offer this project.

A bit about burnout

“A constant, low-level stream of stressors that are out of your control.”
“Stress is not bad for you; being stuck is bad for you.”
—Nagoski and Nagoski, Burnout: The Secret to Unlocking the Stress Cycle

What about stressors that are under your control?
Such as the architecture of a software library you wrote and maintain?

I thought this talk was going to be relatively simple. In Nagoski and Nagoski's book Burnout they say that burnout is caused by a constant stream of stressors beyond your control. Supporting a project used by millions of people definitely results in a constant stream of stressors beyond your control. But I do have more or less complete control over the design of Beautiful Soup and what goes into the code base.

In my mind, each of my two burnout incidents was associated with a major architectural change to Beautiful Soup that took a huge weight of maintenance off my shoulders. In the talk I thought I'd be giving, I would draw a clear line between these architectural changes, the psychological benefit they brought me, and the end of my periods of burnout.

But the talk I'm actually going to give is somewhat different.

The worst time: 2008-2012

June 2008: Beautiful Soup 3.0.7 (SGMLParser)

December 2008: Beautiful Soup 3.1.0 (html.parser)

November 2010: Beautiful Soup 3.2.0 (back to SGMLParser!)

March 2012: Beautiful Soup 4.0.1 (multi-parser architecture, the right solution)

I'm going to dig into my first period of burnout a bit because this was really the worst period in the history of Beautiful Soup development. Originally, Beautiful Soup used Python 2's SGMLParser class as the basis for its HTML parser. In Python 3, this class was replaced by a different class with a similar interface, called HTMLParser. In 2008 I released a version of Beautiful Soup that was Python 3-ready, swapping out SGMLParser for HTMLParser.

HTMLParser is a very nice piece of software now, but in those days it routinely crashed or lost data when parsing poorly formatted HTML. My preparations for the move to Python 3 basically destroyed the entire value proposition of Beautiful Soup. It was really, really bad.

I saw there was an architectural change that could fix the problem–making it possible to swap in the different third-party HTML or XML parsers created since the invention of Beautiful Soup-but it was a significant amount of work that I wasn't able to do because I was in the middle of massive burnout. It took me two years to take the basic step of admitting defeat, releasing a “new” version of Beautiful Soup that went back to SGMLParser and didn't work on Python 3.

But in the end, I took control, gave Beautiful Soup a multi-parser architecture, and emerged from the other side of what I now think of as my first burnout period.

That's what I thought had happened, going into this talk. And all of those things did happen, but they didn't happen in that order.

The same subjective graph of my enthusiasm over time, but with major Beautiful Soup architectural changes highlighted. Contrary to what I thought while preparing this talk, they either trail or lead the actual periods of burnout.

When I was preparing this talk, I put together a timeline, and I saw that the first period of burnout was actually caused by my full time job, as burnout often is. The second period of burnout was caused by my job plus the pandemic. The demands of maintaining Beautiful Soup, though very real and stressful, were never more than a contributing factor to my periods of burnout.

I only had the energy to make the multi-parser architectural change after I left my job in 2011 and began recovering from its burnout. The second architectural change, which I won't go into for reasons of time, actually preceded my second period of burnout. But I distinctly remembered these changes as being an important part of my recovery. In the first case, it may have been, but in the second case that was completely impossible. The need to feel a sense of control over the important things in my life was so strong that I confabulated a story where taking control pulled me out of periods of burnout.

Now, these architectural changes were huge improvements. They make my maintainer's task much easier. They made it easier to come back to Beautiful Soup after a period of burnout, and they make it easier to deal with the day-to-day stresses of maintainership. They’ve made it possible for me to continue maintaining a piece of software I care about for half of my life, despite periods of burnout.

So without making any grandiose promises for what these ideas can do for you, here are the things I've learned from 20 years of solo maintainership.

Basically, it is all about drawing boundaries. This is most clear with a dependency graph or architectural diagram. Beautiful Soup has gone from a monolithic library in a single Python file that you download from my website, to a package on PyPI that has a number of optional dependencies, and well-defined APIs separating the core code and the dependency code.

Over the years I got a sense as to the unique value proposition of Beautiful Soup, and I farmed out everything else to an external dependency or to the Python standard library (which has a much better HTML parser now than it did in 2008.)

This doesn't necessarily mean less code for me to maintain–changing Beautiful Soup to support multiple parsers made the code about 30% larger. But it draws a clear boundary around the code whose quality I have committed to ensure.

My boundaries after twenty years

I've learned to run the Beautiful Soup project within certain boundaries as well. For instance, I probably won't respond to your support request unless it looks like you've encountered a bug I need to duplicate. I just can't do this otherwise. There are a million of you and only one of me.

I put lots of checks inside my code to issue warnings at what seem to be common usage mistakes, such as trying to parse a filename or URL as an HTML document. Some people have complained about these, but you can filter the warnings if they're not relevant to your use, and they eliminate entire classes of support requests, so they're worth it from my perspective.

I've created a comprehensive unit test suite, both for the regular reasons and because I know I will sometimes go for a year or longer without looking at the code base, and I don't want to make mistakes when I come back.

I prefer bug reports to pull requests, because pull requests are often adding fully-formed features to Beautiful Soup that I won't want to maintain. It represents a lot of work from someone that I'm probably not going to use. If a feature request starts out in the issue tracker, we can talk it out and figure out what part of the idea really needs to go into the core project. Then I might ask you for a pull request as a proof-of-concept.

One side effect of all of this is, most of my direct interaction with users is telling them no. So I take extra care to be courteous in my interactions. These people are trying to live their own lives and they're spending the time to contact me about something. I need to respect that, while also preserving the boundaries that keep me on the project in the first place.

Beautiful Soup is the vector of discovery for a lot of bugs in the lxml and html5lib parsers, because it is a uniquely strange user of those packages. When a user uncovers one of these bugs, I always rewrite the example to eliminate any Beautiful Soup code, and report it as a bug against the dependency package. But I don't try to fix the bug myself.

And finally, in the core, beyond all the “no,” we have Beautiful Soup itself, the sum total of all of my yesses. I'm comfortable committing to a high level of quality for the core package, because I've architected it down to a size I'm comfortable with. I know if I die, or burn out again, or decide to walk away from the project for good, the latest release at any given time should basically work for a good while.

Two final announcements

“20th Anniversary of Beautiful Soup” at PyCon Open Spaces

Sunday, May 19th, 1:45-2:30 PM

Test the Beautiful Soup 4.13 beta

pip install beautifulsoup==4.13.0b2
or beautifulsoup==4.13.0b2

Be ready for DeprecationWarnings!

I want to close with two requests for those of you who are users of Beautiful Soup. First, I invite you to the 20th anniversary celebration at the Open Spaces on Sunday afternoon, where the mood will be significantly more upbeat than this talk.

Second, there is a beta of an upcoming release that includes a large number of DeprecationWarnings and changes that probably won't affect backwards compatibility. In this release I added type hints to the code base, and the type hints exposed a lot of very subtle bugs, so I'm trying to be extra careful with this one. Please try out this beta version of Beautiful Soup and report any issues to the mailing list.

And if you'd like to talk to me about burnout, please get in touch sometime at PyCon. Thank you.