Educating the eye?

Kress and Van Leeuwen’s Reading Images: The Grammar of Visual Design (1996)

Charles Forceville

Vrije Universiteit Amsterdam/ Rijksuniversiteit Leiden (OSL), The Netherlands




This review article of Kress and Van Leeuwen’s (KvL) Reading Images: The Grammar of Visual Design (1996) begins by giving a summary of its main issues, and highlights its innovative and bold proposals. In the following sections, some weaknesses and controversial aspects of the book are discussed. Both are seen as following from the semiotic and ideological approach adopted by the authors. Specifically, these affect the proposals for the classification and interpretation of images, and the degree to which the concepts delineated are generalizable. In the later sections, tentative suggestions are made as to how KvL’s approach is relevant to the currently emerging ‘cognitivist’ paradigm.

Keywords: categorization; cognitivism; genre; ideological criticism; interpretation of images; metaphor; relevance theory; semiotics; word & image relations

1 Introduction

Although contemporary society is flooded with images, we have few studies that provide practical suggestions for the analysis of images and word & image ‘texts’ as distinct from more or less theoretical reflections on that topic (e.g. Arnheim, 1969; Aumont, 1997; Mitchell, 1986; Sonesson, 1988; Thompson, 1996).

Gunther Kress and Theo Van Leeuwen make a courageous attempt to help fill the glaring gap with a book ambitiously titled Reading Images: The Grammar of Visual Design, a revised version of their earlier Reading Images (1990). In this review article I will first present an outline of the book’s contents. Given the wide range of topics Kress and Van Leeuwen (henceforth KvL) address, this outline cannot be complete, but it briefly touches upon the study’s main issues and gives samples of their approach. Subsequently I discuss some issues in more detail, indicating where KvL’s ideas in my view require qualification. To come clean straight away: I find the book exciting, thought-provoking and readable, but I have serious misgivings about a number of methodological issues, and some hesitation about the ideological framework. These significantly affect the ‘tool-kit’ character of the book, which thus is not quite the unproblematic textbook its title promises. In the later sections of this article I will suggest how KvL’s views might be embedded in a cognitivist approach – an approach which I believe will ultimately  yield a more inclusive theory of the image than KvL’s semiotically and ideologically oriented one.

2 Survey of the contents of Reading Images

In the introduction the authors explain that theirs is a ‘social semiotics’ inspired by Hallidayan grammar. They ‘intend to provide inventories of the major compositional structures which have become established as conventions in the course of the history of visual semiotics, and to analyse how they are used to produce meaning by contemporary image-makers’ (p. 1). Because of both the ubiquity of images and the communicative and/or manipulative purposes of their makers, KvL claim, it is important that people should be trained how to interpret images. The authors are careful to point out that their grammar is not a universal one but purports to account only for images in western society, and acknowledge moreover that even here there may be regional and social variation. They further presuppose that there is no fundamental difference, but rather a continuum, between creative, artistic uses of pictures and communicative, ordinary uses. But for each type of image they argue that the inclusion or exclusion of details, and the manner of execution, can have ideological implications, and emphasize their concern with this aspect of pictures, seeing their work as ‘a tool for practical as well as critical applications in a range of fields’ (p. 14).

The first chapter elaborates a number of issues raised in the introduction. Barthes’s (1986/1964) famous concept of text ‘anchoring’ or ‘relaying’ the images it accompanies is discussed and criticized. KvL think that Barthes concentrates too much on the interdependence of word & image; by contrast they see the visual component of a text as ‘an independently organized and structured message – connected with the verbal text, but in no way dependent on it: and similarly the other way around’ (p. 17). Consequently, they take the view that ‘language and visual communication both realize the same more fundamental and far-reaching systems of meaning that constitute our cultures, but that each does so by means of its own specific forms, and independently’ (p. 17), although ‘not everything that can be realized in language can also be realized by means of images, or vice versa’ (p. 17). To suggest how the visual affects the way we make sense of the world from a very early age, the chapter extensively discusses two illustrations in children’s books. The authors persuasively argue that while the non-linear nature of the page in one of them, featuring a central drawing and four smaller ones in the corners, seems to turn it into an open-ended text, the number of possible readings is in fact quite restricted, and almost inescapably incorporates certain binary oppositions. The growing role of pictures in contemporary children’s books (and in virtually every other type of text), KvL claim, changes the way information is presented and has implications for what is presented. The chapter ends with a brief discussion of the three functions any semiotic mode has to fulfil in order to serve its communicational and representational purposes.

These three ‘metafunctions’, adapted from Hallidayan grammar, are not limited to a specific medium. The ‘ideational metafunction’ pertains to the ways in which semiotic systems can refer to objects in the outside world, and the relations between these objects, the ‘interpersonal metafunction’ deals with the relations between sender and receiver of the sign, and the ‘textual metafunction’ accounts for the options available to ensure that signs form complexes of signs, that is, ‘texts’. The three metafunctions structure Chapters 2 to 6 of the book.

In Chapter 2, KvL introduce the notion of ‘vector’ as the pictorial equivalent of the action verb. Real or virtual lines between human elements in a picture function in ways similar to verbs describing relations between what in Hallidayan grammar are actors and goals. Since actions presuppose human or human-like agency, vectorial patterns are called ‘narrative’, six major types being identified. These types are to be contrasted with conceptual pictures, which represent participants ‘in terms of their class, structure or meaning, in other words, in terms of their generalized and more or less stable and timeless essence’ (p. 56). KvL point out that the distinction applies not only to naturalistic pictures but also to diagrams.

Three main types of conceptual representations are identified in Chapter 3. The first is constituted by classificational processes, which relate participants to one another in a taxonomy on the basis of some feature they share. KvL nicely illustrate that classificational taxonomies have a tendency to equate all elements depicted on the same level in terms of one dimension, and that this may disguise crucial inequalities. The physical orientation of taxonomies (top–down, bottom–up, left–right) is discussed and some implications are suggested. Even the way in which diagrammatic lines in classificational structures are drawn is not neutral, KvL suggest: straight and curved lines evoke connotations of cold rationality and organicity respectively. The second type of conceptual process is labelled ‘analytical’ and pertains to the depiction of part–whole structures. The third type is ‘symbolic processes’. These pertain to the relation between some element in a picture and what it symbolizes.

Chapter 4 shifts to the interaction between pictures and their viewers. A first important difference arises from whether or not participants in pictures look directly at the viewer. In the former case the participant appeals to the viewer, in a so-called ‘demand’ picture; in the latter case the participant is the object rather than the subject of the look. These latter are ‘offer’ pictures.1 KvL claim that in an Australian primary school textbook the Aboriginal people are typically depicted as ‘offers’, and hence as ‘objects of contemplation’ (p. 126), while white immigrants are rendered as ‘demands’. But the authors acknowledge that sometimes (e.g. in film and newsreading) it is simply genre conventions which dictate the choice between offers and demands. Similar valuations may adhere to the distance of the depicted participants, objects or events from the camera. Close- up, medium shot and long shot suggest increasing social distance, but again, certain frame sizes have become conventionalized in certain types of depiction. In a later section, KvL propose a bold correlation between involvement with the depicted participants and the horizontal angle, which can be frontal or oblique. Discussing two photographs of Aborigines, they conclude: ‘The frontal angle says, as it were: “what you see here is part of our world, something we are involved with.” The oblique angle says: “what you see here is not part of our world; it is their world, something we are not involved with.” The producers of these two photographs have, perhaps unconsciously, aligned themselves with the white teachers and their teaching tools, but not with the Aborigines’ (p. 143, emphasis, as elsewhere, in original) – and the viewer has no choice but to share their perspective.

Pictures reflect different claims to verisimilitude. Chapter 5 deals with this degree of a picture’s commitment to the ‘truthfulness’ to reality, a dimension of pictures KvL call ‘modality’.2 In language, (epistemic) modality is expressed by such auxiliary verbs as may, will and must, and adjectives such as possible, probable and certain. KvL distinguish between eight dimensions that co- determine the degree of naturalness of a picture. It is pointed out that what constitutes naturalness, and hence also deviation from naturalness, via any of the eight dimensions, may vary across different realms of pictures.

‘The meaning of composition’ is addressed in Chapter 6. Pictures, including multimodal ‘texts’, give significant information through the ways in which their elements are arranged. Three aspects are distinguished: the ‘zone’ in which an element occurs (left/right, bottom/top, centre/margin); the ‘salience’ bestowed on it (via foregrounding/ backgrounding, relative size, colour, etc.); and ‘framing’ devices such as vectors between participants. Particularly in the discussion of zones, KvL come up with bold proposals. They suggest that (in western society) the left is the region of the ‘given’ and the right the region of the ‘new’; and that the top is the region of the ‘ideal’, whereas the bottom depicts the ‘real’. Centre–margin spatial structures obviously stress the former at the expense of the latter. Left–right and top–down orientations often combine with centre–margin ones, KvL argue, for instance in so-called ‘triptychs’. The last section is devoted to a discussion of the greater flexibility in reading paths in pictorial or multimedial texts than in (more linear) verbal texts.

Meaning inheres not simply in what is depicted, but also in how the depiction is conveyed materially. This issue is the central concern of Chapter 7. KvL distinguish three major categories: inscription by hand, recording technologies and synthesizing technologies. Each of these ‘modes of inscription’ models its own relations between the producer and receiver of an image, while the distribution of the image, too, is affected. Types of brushstroke (vigorous, thin, pointillist, etc.) and surface materials (canvas, marble, wax, etc.) co-determine the overall impact and connotations that a representation is likely to realize.

In their last full chapter, KvL tentatively explore the area of the three- dimensional image, ranging from sculptures to children’s toys. While many of the concepts delineated with respect to the two-dimensional image are applicable to the three-dimensional as well, there are some obvious differences. For instance, there are often no fixed perspectives from which to look at sculptures, spatial orientations (left–right, top–bottom, centre–margin) being more flexible; and the physical setting in which they appear may differ from one museum or gallery to another. Furthermore, certain three-dimensional objects (e.g. Playmobil toys) encourage interactive behaviour.

KvL indeed cover an impressive range of aspects and types of pictures. The choice of breadth inevitably means that a price is paid in terms of depth, so that in some cases the analyses only whet the reader/viewer’s appetite for more detailed analysis and more examples, while in others there remains a gap between the analytical tools proposed and their practical use. But I discern also a number of serious problems pertaining to methodology and perspective in KvL’s approach. In the following sections I will outline and discuss my doubts and criticisms. I present them in the awareness that KvL are pioneering largely unexplored territory. My observations aim at marking some pitfalls and pointing out distortions inherent in the type of map KvL have drawn.


KvL are social semioticians, and it is typical of semiotic approaches that they present phenomena in terms of oppositions, often in grid patterns or tree diagrams. This, of course, is in itself a commendable way of defining similarities and differences. Charting a new field involves categorizing, if possible hierarchically, a hitherto undivided mass of data. KvL regularly summarize the pertinent distinctions they have found in a tree diagram, followed by a section, ‘Realizations’, in which they briefly describe the sub-categories (some of the either/or type, others of the and/and type) in the tree. The usefulness of these hierarchical categories will depend on their applicability to new pictures. While the authors describe and illustrate all their categories, not all of them are included in the ‘Realizations’ sections. Why not? The number of levels distinguished can amount to no fewer than six (p. 107, see Figure 1 in Appendix). We need all the help and examples we can get to be persuaded that the schema in Figure 1 is correct and applicable, but often we have to make do with a short description and a single example of each slot in the schema – and the (arbitrary?) absence of several subtypes in the ‘Realizations’ section is irritating. A more serious problem is that according to Figure 1 ‘inclusive spatial structures’ cannot be conductive, since ‘conductivity’ is one of the subdivisions of ‘exhaustive’, but not of ‘inclusive’. Exhaustive structures, KvL explain, depict all the elements their carriers comprise, whereas inclusive structures do not. The latter select only a few elements for depiction. Thus a technical drawing of a machine may (exhaustively) depict all of its parts, or it may (inclusively) highlight only some of them. Now ‘conductors’ are said to ‘indicate a potential for dynamic interaction between the Possessive Attributes they connect’ (p. 100). Examples of conductors are a pipeline, a road, a railway track, but they may also be of a more abstract kind. It is clear that an exhaustive technical drawing of a machine can be conductive, but I cannot see why an inclusive technical drawing of a machine may not be equally conductive. However, in the scheme this possibility is excluded.

Here is a comparable issue. In Chapter 5, KvL devote a section to markers of visual modality. They define eight markers: colour saturation, colour differentiation, colour modulation, contextualization, representation, depth, illumination and brightness. They say sensible things about each of these markers, but there is little discussion on how some of them (notably the first three) relate to one another, and how they can be used in the practical analysis of specific pictures. In the ensuing section, ‘coding orientation’, KvL argue that what constitutes the ‘highest modality’ (that is, what is considered ‘most normal’) depends on the kind of picture discussed. They distinguish the following four coding orientations: scientific/technological, sensory, abstract and naturalistic.

The abstract coding orientations are used by sociocultural elites – in ‘high’ art, in academic and scientific contexts, and so on. In such contexts modality is higher the more an image reduces the individual to the general, and the concrete to its essential qualities. The ability to produce and/or read texts grounded in this coding orientation is a mark of social distinction, of being an ‘educated person’ or a ‘serious artist’. (p. 170)

In diagram 5.5 (p. 171) the modality values of colour saturation are given for each of the four orientations. Highest modality in the abstract orientation is ‘black and white’. Whereas this may be an accurate description of scientific pictures and diagrams, I am by no means convinced that the same standard applies to ‘high art’. KvL seem to have some doubts, too, for after discussing a number of paintings they admit that ‘the examples in the previous section show that the modality values in art can be complex’ and a few lines later they go even further: ‘in many other kinds of images, too, “modality markers” do not move en bloc in a particular direction across the scales, say from the abstract to the sensory, but behave in relatively independent ways’ (p. 176). But then, of course, one wonders what is the relation between the different modality markers, and what conclusions one may draw on the basis of a certain marker having high or low modality.

This is not nit-picking: these difficulties point to a more basic problem, namely, the problem of categorization. Of course the delimitation of categories and the development of criteria to decide membership or non-membership of an item in a category are crucial to scholarship of any kind. But the problem is that categories are seldom clear-cut; many categories are fuzzy, and describe a continuum between extremes rather than a binary opposition with an either/or structure. At the end of a discussion about classificational processes, KvL themselves draw attention to this danger: ‘Our discussion above has, we hope, made it clear that we see these distinctions as tools with which to describe visual structures rather than that specific, concrete visuals can necessarily always be described exhaustively and uniquely in terms of any one of our categories’ (p. 88). However, this caution is usually absent when KvL present their own classifications; the typically semiotic either/or branches3 as well as the hierarchical structure hide the fact of fuzziness, suggest exhaustiveness and hint at stable, authoritative hierarchies. In this respect, KvL might have benefited from Rosch’s work on ‘prototype theory’, a notion that is central to Lakoff’s famous Women, Fire and Dangerous Things (1987). Lakoff rejects the notion that something is either absolutely in, or outside of, a category in favour of the notion that categories are radial structures, with more and less prototypical members. His book appears in KvL’s bibliography, so one would have expected the authors at least to discuss, and possibly even to accommodate, this very different view of categorization.

4.The interpretation of individual images

KvL illustrate and underpin the pertinence of the theoretical concepts they adduce by constant reference to the many pictures (176 black and white and 8 colour plates) in the book. This interaction between theory and practice is an excellent feature, because it makes the theoretical proposals concrete, lively and verifiable. Since KvL aim at providing a ‘grammar’ of western images, the concepts they develop should ideally be routinely applicable; more specifically we would expect that the demonstrations of the applicability of their concepts to the pictures in the book are convincing and unproblematic: after all, the authors were free in their choice of examples. And indeed in many cases the analyses of the pictures in the light of the concepts discussed are persuasive and illuminating. But alongside these there are a substantial number of pictures whose interpretations are debatable, or at the very least one-sided. And this means that in these cases whatever ‘grammar’ is operative in pictures is less intersubjectively shared than KvL suggest. That is, either KvL claim too much explanatory or even predictive power for their concepts, or the delineation of some ‘grammatical rules’ requires further refinement. In this section, I will focus on some problematic cases.

To begin with, KvL are at least once verifiably wrong in their description of a picture. Explaining that pictures with only one participant exemplify ‘non- transactional structure’, KvL state that 

the action in a non-transactional structure has no ‘Goal’, is not ‘done to’ or ‘aimed at’ anyone or anything. The non-transactional action process is therefore analogous to the intransitive verb in language. … In the picture in figure 2.16 [a film frame] the principal Actor is formed by Ben Hur (Charlton Heston) and his chariot. … Ben Hur ‘races’, but he does not race anything or anyone, at least not so far as we can see in this picture. (pp. 61–2)

But this is simply not true. Anyone who has seen the film knows that Ben Hur in this climactic scene does race against a number of opponents (and more specifically against his former friend and present foe Massala). Curiously, figure

2.16 depicts no fewer than three other chariots racing in the stadium, so that even the visible evidence in this isolated film frame belies KvL’s analysis. That is, the frame exemplifies a transactional structure, structurally similar to that identified in figure 2.1 (p. 43), which KvL render, plausibly, as ‘the British stalk the Aborigines’. The problem recurs in the analysis of a sculpture by Kenneth Armitage, People in the Wind (1952), shown in figure 8.2 (p. 244). KvL conclude that the sculpture has ‘strong vectors, formed by the way the figures are bent forwards as they struggle against the wind. But ... the action is “non- transactional”. … The figures, it seems, “strain forwards”, but they do not “strain towards something” ’. Perhaps this is true if one verbalizes their action as ‘to strain towards something’, but with another verbalization there is no problem.

Figures 2.16, 8.1, and 8.2 could be rendered as ‘Ben Hur fights his opponents’, ‘Jacob fights the Angel’ (the latter a sculpture described as transactional) and ‘the people fight the wind’, respectively. That is, given these verbalizations, all three pictures qualify as being transactional. I dwell so long on this issue because it might reveal that KvL compare visual structures too much with surface language instead of with the mental processes of which both surface language and images are the perceptible manifestations.5 

In other cases, too, interpretations by KvL are controversial. Take the following:

Figure 3.38 shows an oil drilling installation in the Sahara desert. The de- emphasizing of detail, and hence the ‘mood’ of the picture, results from the extreme lighting conditions in which the setting sun plays the role of a low backlight. In this way the oil drilling installation becomes a symbol for the disappearance of the old Bedouin lifestyle. The accompanying text in fact ends with a quote from a Bedouin, lamenting the demise of traditional ways of life – followed by the photo. (p. 112)

An alternative interpretation of the same picture, however, could be, ‘How beautifully man-made industry merges with nature!’. Why don’t KvL offer this alternative interpretation besides their original one? One reason is suggested in the quotation itself: KvL take into account the text that anchors the picture. But that means they have not simply interpreted the picture; they have interpreted the picture-cum-text. That is, they defuse their own (already quoted) view, contra Barthes, that a picture and its accompanying text are mutually independent. Once you have read the accompanying text, you can no longer look at the picture in a disinterested manner, as KvL themselves prove here. But there is something else. It is clear that KvL are committed to the idea that pictures reveal ideologies – a notion that is generally shared in an age permeated by the postmodernist awareness that no representation of reality can ever lay claim to being neutral – and they are keen to identify and expose any suggestions of false ‘naturalness’ lurking beneath the surface. I sympathize with this aim, although I have reservations about the ease with which they declare this holds for artistic representations as well (p. 13). But while in a number of cases they are admirably persuasive in showing how apparently innocent pictures slyly manipulate their viewers, sometimes this zeal to expose hidden ideologies takes precedence over a cool attempt to analyse what is, presumably, objectively there. Surely the whole point of developing a visual grammar makes sense only if there is general (within- culture) agreement about the presence and effects of at least some aspects in a picture, and these intersubjectively establishable aspects need to be identified and described before any valuations or interpretations are attached to them.

A similar problem surfaces in KvL’s discussions of various children’s drawings. The interpretations are fascinating, but there is no way in which the reader/viewer can verify them empirically. Here is an example:

Figure 4.27 is the front cover of a ‘story’ on sailing boats by a child … The characters [in the boat] do not look at us. … The angle is frontal and eye level, and the two figures in the boat are neither particularly distant, nor particularly close. There is no setting, no texture, no colour, no light and shade. … But for the two figures, simply drawn, and more or less identical, except for their size (a father and son?), this could be a technical drawing. As such it suits the objective, generic, title, ‘Sailing Boats’. … In most of the illustrations inside the [visual] essay, no human figures are seen, as though the child already understands that the ‘learning’ of technical matters should be preceded by a ‘human element’ to attract non-initiates to the subject. (p. 158)

Well, yes, possibly – but this remains rather speculative. First, as in the preceding case, the authors invoke context in the form of (a) verbal anchoring, via the title, and (b) other pictures in the pictorial ‘essay’. Again, the picture is interpreted, not simply as an isolated representation but as a word & image text, and moreover as part of a more comprehensive whole. If this is crucial to KvL’s interpretation, however, a grammar of pictures is after all quite heavily dependent on textual anchoring and (pictorial) context, but KvL do not pay much attention to the interaction between pictures and (con)text. Although they profess to be aware of the importance of context, their concepts and models do not specify how context must be incorporated (for alternative approaches, see Cook, 1992; Forceville, 1996). And if KvL are allowed to speculate about the drawing’s meaning, so is everybody else. Let me try: the boat sails from right to left. Given the importance of left–right orientations in pictures in terms of given–new, it is clear that the child has a longing for the given rather than the new. The boat sails toward the given, the past, turning its stern to the new, the future, and the child may suffer from regressive behaviour and fear of the future. Moreover, the title of the drawing occurs in the top half of the drawing, that is, in the region of the ideal, while the picture is underneath. The child is thus rooted in the pictorial, but aspires to language. Sensible? Flippant? I am sure that KvL have hit upon important distinctions, but they do not specify when bottom/up and left/right orientations do apply and when they do not. In several instances, KvL are carried away by their theoretical and ideological framework, arbitrarily or rigidly applying it to new pictures, and this sometimes yields highly unconvincing results. A full-blown visual grammar should predict, or at least suggest, under what conditions certain ‘rules’ operate. But acceptable or valid interpretations of a picture may reside less fully in picture-immanent factors, and correspondingly more in pragmatic ones, than KvL are prepared to acknowledge. This brings us naturally to the generalizability of KvL’s concepts.

5. Generalizing across pictures

One of KvL’s exciting decisions is to tackle pictures from a wide variety of sources, mixing paintings, diagrams, film frames, school book illustrations, newspaper photographs, technical drawings, emblems, etc. This strategy is refreshing, and potentially innovative: if the authors succeed in demonstrating the general applicability of a certain concept, irrespective of the type of picture, they genuinely help advance the theory of pictorial representation. However, many proposals are problematic, for while grammaticality in language is relatively stable across text-genres, the grammar of images may well be considerably more genre-dependent. For one thing, KvL take great risks in incorporating the occasional film frame. A film frame tolerates even less decontextualization than static pictures. Not only does a frame acquire meaning and significance only in combination with text (dialogues, voice-overs) and non-verbal sounds (music, sound-effects), its meaning crucially depends on the shot and the sequence in which the frame occurs (and ultimately on the entire film). I have already referred to this in connection with the shot from Ben Hur, but the analyses of a frame from Bergman’s Through a Glass Darkly (figure 6.1, p. 182) cause uneasiness as well. Many of the distinctions signalled by KvL are not intrinsic to this particular, isolated, frame, but can be interpreted only by taking into account information beyond the frame itself. More specifically, the significance of a certain spatial lay- out in a frame can be gauged only when compared with those in other frames or shots involving the same characters, and this presupposes studying an entire film’s montage and editing patterns (Hurst, 1996: 126). That is, expectations on the basis of what has preceded are as important as picture-immanent structures.

Another example of KvL’s too sweeping claims is their discussion of sequences of three images, ‘triptychs’. They maintain that the middle one usually ‘mediates’ between the other two, and give some examples (p. 208). But as Goodman demonstrates, many triadic structures, whether verbal, visual or medially mixed ones, display a different structure, namely, that of a mini- argument with the third item providing the conclusion, solution or punch-line.

Newspaper lay-outs often pictorially reinforce the ‘argument’ structure of three- item constellations (Goodman, 1995: 150 ff.), and the prototypical three-part cartoon has the joke’s apex in its last panel (Goodman, 1995: 159–60).

In short, KvL often too easily assume (a) that their examples are representative and (b) that their personal interpretations have intersubjective validity. One of the tasks of the project of developing a more refined and sophisticated visual grammar is to be more specific about the differences between shared and non- shared interpretations of (elements of) pictures. It seems to me that fruitful insights are to be gained from the following lines of research. In the first place, the notion of text-external context (to be distinguished from text-internal context, see Forceville, 1996: Ch. 4, et passim), in the widest sense of the word, needs to be studied and theorized. Contexts not only affect the interpretation of images (as they affect the interpretation of any type of texts); they often crucially co- determine them. Irony, for instance, can only be detected against the background of extra-textual background assumptions. More generally, what is needed is an awareness of authorial intentions that underlie pictures, and ‘genre’ is a great help in this respect. Interpretations of a picture will be considerably constrained by the awareness that it belongs to a certain genre. Other aspects of text-external factors that may influence interpretation pertain to the identity of the viewer. Gender, age and cultural background may all play a role in this respect. That is, a text- immanent analysis of pictures needs to be systematically complemented by pragmatic analyses (cf. Pateman, 1980). This brings me to a second line of necessary research: empirical testing. Hypotheses about the impact of variables of genre and of audiences upon interpretation can and must be tested. Work done by empiricists working on literary texts can help focus ideas on how this is to be done (cf. Ibsch et al., 1991; Steen, 1994; Zwaan, 1993). Moreover, some empirical work on the interpretation of various types of images has already been done (Camargo, 1987; Forceville, 1995; Mick and Politi, 1989; Morley, 1983; Petterson, 1995; see also the extensive bibliographies in Braden, 1996 and Moriarty and Kenney, 1995).

6. A visual grammar as part of cognitive science

The fact that KvL try to adapt Hallidayan grammar to pictures reveals their awareness that certain communicative concepts exceed the boundaries of a specific medium. Their allegiance to ‘social semiotics’, however, leads them to link their insights sometimes rather quickly to ideological criticism. This is unfortunate, not only because they sometimes tailor practice to theory rather than vice versa, but also because they risk appealing only to readers who share their critical stance. This might obscure the fact that KvL’s book also contains much that is of interest to those who are primarily concerned with an issue that, as far as I am concerned, is more fundamental, namely, the relation between what goes on in the mind and manifestations of this activity. Scholars interested in this issue, scattered over many disciplines, tend to use the adjective ‘cognitive’ to characterize their work, and slowly even ‘the humanities’ are beginning to latch on. Thus, Cook (1994) shows how some version of schema theory as advocated in AI studies is a necessary component in any theory that wants to explain how literature is understood. Similarly, two leading film theorists, Bordwell and Carroll, are trying to direct research interests away from ideologically oriented film analysis (specifically psychoanalytical work) to what they call ‘cognitivism’: ‘A cognitivist analysis or explanation seeks to understand human thought, emotion, and action by appeal to processes of mental representation, naturalistic processes, and (some sense of) rational agency’ (Bordwell and Carroll, 1996: xvi; cf. also Bordwell, 1985; Carroll, 1996). My own work on pictorial metaphor, while fully acknowledging that metaphors have a strong ideological dimension, primarily tries to outline what forms pictorial metaphors can take, and suggests how word & image texts guide, but cannot enforce, interpretations (Forceville, 1996). Sperber and Wilson’s (1986) relevance theory provides angles for such an approach. It is partly because their theory is not limited to verbal communication but allows for other modes of communication as well (see Forceville, 1996: Ch. 5) that it is so exciting and fruitful. It encourages scholars to compare ‘texts’ from different media because their intentions and effects can be subsumed under the same general heading of manifesting the aim to convey certain assumptions (or ideas, moods, feelings). Moreover, Sperber and Wilson’s theoretical distinction between aspects of a message that are unequivocally communicated (‘strong implicatures’) and aspects that are weakly, more ambiguously, conveyed (‘weak implicatures’) may well be particularly pertinent to pictures, especially static ones. And their insistence that relevance is always relevance to an individual (1986: 142 ff.) should sound comforting and stimulating to humanist scholars who, quite rightly, are deeply suspicious of any theoretical approach that claims to predetermine or even predict all interpretational possibilities.

Sperber and Wilson’s ‘relevance to an individual’ has a clear echo in Johnson’s ‘meaning is always meaning for some person or community’ (1987: 177). The cognitivist programme in the Lakoffian tradition (Gibbs, 1994; Johnson, 1987, 1995; Lakoff, 1987, 1993; Lakoff and Johnson, 1980; Turner, 1991) provides good theoretical starting points for investigations of non-verbal, or multimodal, manifestations of the mind’s workings. Conversely, the tenability of its central theses (such as prototype theory, the idea that our way of conceptualizing is ultimately guided by the human physique, the pervasiveness of the figurative in so-called ‘literal’ language and thought) will have to be tested extensively in the realm of the visual, an area that has hitherto completely been ignored by those working in the Lakoffian framework. KvL’s book supplies various elements that constitute potential bridges to the cognitivist programme.6 Their concept of the ‘interordinate’ level in hierarchies (p. 81) has clear parallels in the ‘basic level’ developed by Rosch and discussed in Lakoff (1987: 31 ff.). And KvL’s discussion of top–down, left–right, centre–margin orientations invites theoretical cross- fertilization with Johnson’s (1987) observations about the fundamentality of a number of image schemata underlying human conceptualizing, such as ‘path’ (including up–down orientations), ‘cycle’, ‘link’, ‘balance’, ‘centre–periphery’. Johnson’s claim that ‘image schemata are pervasive, well-defined, and full of sufficient internal structure to constrain our understanding and reasoning’ (1987: 126) echoes precisely the kind of programme that I take KvL to pursue when they try to formulate the ‘rules’ describing systematic relationships between thinking and the production/reception of pictures.

7. Concluding remarks

In this review article I began by giving an outline of KvL’s book and indicating some of its strengths, and proceeded by voicing a number of serious criticisms of its methodological weaknesses and ideological commitments. I ended by embedding these views in broader concerns not necessarily shared by KvL themselves. It might seem that my respect for, and excitement about, KvL’s book have become somewhat buried under the criticisms. Let me therefore repeat that I think that Reading Images is significant and innovative. KvL present a host of concepts and tools for the analysis of pictures, many of them illuminating and unexpected. The wealth of pictures and discussions in their attractively produced book provides ample food for thought and further theorizing. Because of the way they present their concepts, and the applications to specific pictures, their work has the merit of being amply verifiable and falsifiable. As I explained, I am by no means convinced of the general applicability of a number of their concepts, but by making explicit claims they do open up opportunities for counterclaims, based on other pictorial data and/or experimental research. Given its format, KvL’s study is clearly intended to be used as a textbook, presumably at undergraduate level. In view of its methodological deficiencies and strong ideological commitment, this entails some dangers. Nonetheless, the gains are worth the risks, on condition that KvL’s ideas are subjected to highly critical scrutiny: partly by juxtaposing the book with more theoretically oriented approaches, some of which have been suggested in this article, partly by systematic testing of the concepts against new pictorial material.


I thank Leo Hoek, Elrud Ibsch, Lachlan Mackenzie, and Ed Tan (all Vrije Universiteit Amsterdam) for their comments on earlier drafts of this review article. The responsibility for its contents remains entirely mine.


  • The contrast recalls the difference between the (impertinent, colonizing) ‘gaze’ and the (dialogic) ‘glance’. First proposed by Norman Bryson, the distinction was popularized by Mieke Bal. Surprisingly, KvL do not refer to Bal’s work on ‘gaze/glance’ here, although they include her book in their bibliography – and curiously they cite the Dutch (Bal, 1990) rather than the more widely accessible English version (Bal, 1991).
  • Note that what KvL subsume under the general heading of ‘modality’ here is what in Simpson (1993: Ch. 3) is equivalent to one main type of modality out of four, namely, ‘epistemic modality’.
  • As indicated, KvL also use symbols in their diagram to indicate that certain dimensions can co- occur in a single picture. But here, too, one wants to know under what conditions this is possible.
  • By contrast, Sonesson, who also works in a semiotic framework, is acutely aware of the theoretical threat of prototype theory to traditional semiotic accounts, and discusses prototype theory at considerable length (Sonesson, 1988: 66 f.). 
  • Moreover, verbal transitivity, which for KvL is the paradigm upon which they model pictorial transitivity, is by no means a simple case. It is, as Hopper and Thompson (1980) show, a matter of degree, and hence itself subject to prototype effects.
  • KvL are, however, not likely to sympathize with, or perhaps even accept as possible, cognitivists’ attempts to distinguish between what is more or less ‘neutrally’ there and any ideological valuations that adhere to this more or less ‘neutral’ nucleus.


Charles Forceville, Vrije Universiteit Amsterdam, Faculty of Arts, P.O. Box 7161, 1007 MC AmsterCette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.

Figure 1 
Analytic image structures. Reprinted with permission from: Gunther Kress and Theo Van Leeuwen, Reading Images: The Grammar of Visual Design. London: Routledge, 1996, p. 107

