Architectural Deep-Fakes in the age of automation.

March 3, 2024

No items found.

In the summer break after graduating, I bought an absurdly heavy camera and set off with my closest architecture-obsessed friends for what became an epic road trip across Europe.

We wanted to find the works that we’d poured over in books and we packed a lot in. When we arrived at each building, it often felt like we were meeting a hero and we wanted to take in every detail. I have an obsessive streak and so naturally I went about photographing as much as I could; shamelessly contorting myself into weird positions so I could capture the angles I hadn’t seen before, recording material textures, joinery, door handles even.

The group began competing to shoot the definitive photograph of each project, thus earning bragging rights in the car for the next epic leg of our journey. More than once, my friends threatened to leave me behind as they sat waiting in the car for me to get “just one more”…

This magpie-like behaviour is widespread among humans, we covet objects and forms and have an innate desire to take away a part of them with us, a treasure in the form of images. This is the reason we now have such an abundance of data.

My photographs were all titled with the architect’s name and project location and were uploaded onto Flickr under a Creative Commons license for sharing with others.

In doing so, they entered the “public realm”. The many hundreds of images I shot were later harvested and along with billions of others, used to train what we now call Diffusion Models, like Midjourney and Stable Diffusion.

Intellectual Property in Architecture

Until now we’ve considered the essence of Intellectual Property in Architecture as safely residing in the “Design Documents”, which we produce in the form of drawings and specifications and which we license to our clients for a one-time use.

As architects, we seldom own the published imagery of our buildings, in fact the copyright of images usually belongs to the photographer whom we pay to visit the site and take photographs of our work. These photographs don’t infringe our design IP and in the UK there is a “Freedom of Panorama” meaning anyone can walk past a building in the public realm, point a camera and take a photo, so no harm right?

Well, the word “image” has its origins in the latin word “imitari”, meaning to copy or imitate (thanks Ninnie Yeo for this one). An image can only ever be an imitation of the subject, it cannot contain the essence. But what if aggregating many many public images of a single project could document more about the project than we can see, allowing strangers to recreate or repurpose our designs with incredible ease?

Far from being inert, we may begin to see this wide collection of images of a project as containing something of the core design IP within them.

Diffusion models seem to know a lot about the fundamental design characteristics of some well-known projects, it’s not just the 2D veneer we are used to. They can reproduce character, form, proportion, material scoping, composition and the expressive detailing of a design. This is a neat trick but it puts us in uncharted territory when it comes to IP. We are snatching something of the soul of a project and reproducing a hollow shadow of it.

Here’s an example:

It looks like the Gherkin, but something is amiss, the light is different and the background is distinctly Manhattanish. The facade is also wrong in ways that would definitely make the architect cringe (sorry Foster + Partners). But how are we able to perform this magic trick? Buildings like Foster’s 30 St Mary Axe (the Gherkin) form part of the ‘iconic’ oeuvre of the early 00’s and as such they have been widely photographed by the public and by photographers whose work forms part of the large data sets used to train AI models, such as LAION 5B. That’s “B” for “Billion Images”.

The sheer number of images of this single building in the training data is so large that one can recreate an uncanny likeness of its form, structure, fenestration patterns, material palette and colours with just a text-based request. From playing with this idea using various tools I can see Midjourney V6 is by far the best at this particular trick that other AI models, because it can imagine views of the Gherkin from almost any angle. We can multiply them on the skyline without using Photoshop — or to take the idea to its extreme — imagine a surreal streetscape made up entirely of only gherkin-like structures. In each case, the view is changing, but the diffusion model’s ability to reproduce the geometry and facade from a new angle or under new lighting conditions is startling:

Also, not the Gherkin 2 & 3. An image reproduction of Norman Foster’s St Mary Axe (aka the Gherkin), produced by Arka Works using Midjourney (Generative AI)

The examples are full of little geometric errors that upon deeper inspection, mark them out as clear frauds: they are architectural deep-fakes. In the Manhattan image, the contextual storey heights are definitely not correct — perhaps this is because Midjourney struggles with blending the scale distinction of NYC vs London, its much shorter cousin.

While incredibly vivid in style, Midjourney produces flat images, which means that aside from early-stage concept work, they aren’t yet ‘useful’ for offering full design control over volume and connected spaces — although there are some doing early experiments in this realm (see Blender transforming 2D into 3D objects). This ability to call upon a very specific architectural style reference or motif is important because Midjourney enables the splicing of inputs. If a designer so desires, they can fairly easily export the essence of the Gherkin’s fenestration onto a novel concept for an entirely different project, it does not require great effort or insight, it just requires the idea and some prompt craft.

Here is an example of what I mean: a conceptual design idea for a large sports stadium, wrapped in cladding that follows the distinctive diagrid arrangement of the 20 St Mary Axe; I have crudely spliced the two ideas together through prompting to create a singular outcome:

Also, not the Gherkin 4. An image produced by mixing the facade of Norman Foster’s St Mary Axe (aka the Gherkin) with a design concept for a football stadium. Image, produced by Arka Works using Midjourney (Generative AI)

Plagiarism Writ Large?

This process is so vivid and low effort that it feels inherently “like cheating” and designers used to labouring over their drawings, inching towards inspiration — will likely agree.

Yet, enter almost any architecture studio during the concept phase of a project and you will see design reviews with facade and plan precedents being pinned up alongside the project for reference and inspiration. We know intrinsically that an invisible line exists between being inspired by something and outright copying it.

That line is a subjective threshold that must always be crossed in order to realise a true design transformation, too little change and an idea is deemed derivative.

All the best architects I’ve worked alongside know how to dance with this line and they design with a personal archive of treasured icons and concepts that they can call upon, transforming and combining these familiar ideas into new settings and configurations at will. To the designer, such conceptual connections make intuitive sense. From the outside this process can seem nonsensical, that is until a fresh drawing emerges and stands on its own — a novel concept, arriving from a special blend of ingredients that for whatever reason — just clicks.

Human Creativity

Last week I got into a fairly deep conversation with Albert Hill (from The Modern House) about the nature of creativity through the lens of AI.

He pointed me toward Samuel Coleridge’s theories of imagination; the highest order of which he called “Secondary Imagination” and this is described as a completely novel form of invention, born out of unique insight and inspiration.

Coleridge believed that poetry and great art came from this realm. But he also talked about another kind of imagination he labeled “fancy”. Fancy is much more of an additive phenomenon that combines ideas like bricks; one on top of another. Fancy is like copying, but with a twist.

I’m not sure I agree that a completely novel and unbiased imagination can truly exist, we are after all the product of our lived experiences, unavoidably conditioned by the things we’ve seen, felt and remembered. Nevertheless, the romantic poet was onto something. Sometimes “fancy” will do the job and enable the next step, but it’s not going to truly nourish a project if we are not also striving for a much larger dose of Secondary Imagination when we go about our work; our minds need to be open to pure improvisation too. I think there is a real risk that Generative AI will keep ideas more and more in the realm of Coleridge’s fancy.

Afterall, the convenience and speed being offered by tools like Midjourney will likely position them as the default concept mode for designers in increasing numbers and in the process, negating the deeper and harder work. Like design fast food.

This will become more and more of a challenge as graduates enter the profession who’ve learned to think about design through the heavy use of Midjourney.

We have already seen a similar phenomenon emerge since architectural journalism moved from longer-form industry journals and heavy monographs into image blogs like Dezeen that champion the seductive image above all else and this becomes the mark by which we judge quality.

Consider, when we compile a mood board of ideas using Pinterest, how much agency are we really exerting? The algorithm has already harvested the preferences of millions of architects before us and knows exactly what to serve us up next; oh you like that handmade bathroom tile? Pinterest has a perfect light fitting to go with it.

This Pinterestification of more and more of our design choices, causes a creeping trend towards a singular convergence of taste. Now translate that idea to Midjourney and the dial is turned up yet further, at a certain point you may wonder who is doing the thinking.

Fair Use

Let’s look at another example, this time Falling Water, a case study project we learn about in 1st year of architecture school and one that I only really appreciated years after qualifying. The copyright for the photo on the left is owned by the Laurel Highlands Visitors Bureau, the image on the right was created in Midjourney — so who owns it?

Two images of Falling Water. The first (on the left) a photograph owned by Laurel Highlands Visitors Bureau, the second an entirely new image generated in Midjourney with word prompts alone.

The synthetic version can only exist because thousands of people have taken photographs of this stunning house leaning out over the waterfall. They could not have known that in a small way, their work would later be harvested for this new unexpected purpose. Even if they could, how might we credit them — 0.00001% to each author? To further muddy the falling water, the house sits on private land with no Freedom of Panorama: the owner specifically prohibits photographs for commercial use.

Is money changing hands to make this new image? I pay a license fee to Midjourney, so is this fake image an infringement of the IP of Frank Lloyd Wright’s estate? Does the creation of such an image cause harm to the owner of the land or anyone else in the chain, or is it Fair Use?

That is a lot of unanswered questions in a row, so let’s try to answer. As long as I am not brazenly copying Falling Water with my next design, causing a loss, falsely claiming authenticity, or generating serious revenue that would otherwise have flowed to the other people in this long chain, then I will go ahead and presume that this sits within the bounds of Fair Use. We might still all agree that it’s all probably in bad taste.

This is article is intended to demonstrate what architectural deep fakery would look like and hopefully no one is offended or confused. Is measurable harm or loss of earnings caused by such techniques? I would say no, but the legal boundaries of Fair Use have surely changed irrevocably since the invention of Generative AI and while we wait for case law to catch up and readjust the goalposts, people are jumping in and experimenting.

Here’s what we do know so far: current case law states that images created in Midjourney have authors, but they are not copyrightable. A recent US court case found that AI works are not human creations and therefore, without substantial human transformation, they cannot be subject to copyright protection. So I can produce images of Gherkins spawning across London, but if someone else puts them on a T-Shirt, I would have no claim to that revenue, and neither would Foster + Partners.

Some practical suggestions

If you are tipping your toes with Generative AI in practice and you are concerned about Copyright and IP implications, here are 5 simple suggestions to consider:

1. If you object to your practices’ imagery being used in the training data, you can request delisting from future training here: https://haveibeentrained.com/. If you are a widely photographed practice, you may be closing the stable door after the horse has bolted, but you will at least know what is out there in the public realm and clicking “Do Not Train” may feel therapeutic.

2. If your team are using imagery or design references from outside your own studio’s portfolio when working in Midjourney or Stable Diffusion, insist on substantial transformation of the original idea from your team as a baseline.

3. Make sure you label things for what they are, if it’s made with the assistance of Generative AI, it is good practice to say so alongside.

4. If you are using Diffusion models for early concept design work, set some boundaries around Fair Use, for example; you may choose to prohibit use of architects’ names, direct project references, or use of copyright images in your prompts.

5. If you want to leverage your own IP and have a huge library of images you love, you may benefit from creating private image models. Rather than using general models, consider training and hosting your own custom Stable Diffusion Checkpoint models on-site by using LoRA or Dreambooth techniques for model training. Your data will remain private and only you will have access to these models.

Date

Published

Subject

Link

Architectural Deep-Fakes in the age of automation.

More like this

Dicky Lewis, Co-Founder of White Red Architects, unpacks the evolving world of architectural concept creation. From hand sketches to AI tools, Lewis shares his journey and insights on blending traditional and digital methods.

Case Study: White Red

Dicky Lewis, Co-Founder of White Red Architects, unpacks the evolving world of architectural concept creation. From hand sketches to AI tools, Lewis shares his journey and insights on blending traditional and digital methods.

September 16, 2024