Adam Mathes
LIS350SE
April, 2004
This report evaluates the effectiveness of a content-based image retrieval system in the specific domain of video games. More specifically, the domain of video game screenshots - actual pixel representations of the game during execution. By examining the results of image-based queries, this study attempts to determine if visual similarity as determined algorithmically correlates with subjective determinations of video games, such as genre.
Content-based image retrieval offers some interesting possibilities. Primarily, the prospect of finding relevant results without the manual tagging of content with metadata, or manually grouping content, and the resulting costs and difficulties. Although most of the research and development in the area of image content-based retrieval focuses on photographic content, whether or not it applies to a slightly different image domain - screenshots of video games - is an interesting topic. Whether these computer generated images, when searched and sorted by computer algorithms, correlate to what we commonly think these games are about is the fundamental issue explored here.
The Multiple Arcade Machine Emulator, or MAME [ www.mame.net ] is a collaboratively developed program that documents arcade hardware for preservation and emulation on more modern platforms. [1] It supports thousands of arcade games. A widely used collection of screenshots of these games is available from the MAME32 Quality Assurance page. [ www.classicgaming.com/mame32qa/down.htm ] [2] A large subset, consisting of 2187 images, was used as the dataset to analyze and perform queries on. Specifically, images of games that were included in MAME versions up to but not including 0.68 were indexed, images in 0.68 and later versions were not.
The images were in Portable Network Graphics format (PNG) and ranged in size from a single kilobyte to 104kb. Additionally, all images were sized to be 244x184 pixels, or 184x244 pixels. Even for older games like Pac-Man, this is smaller than the original pixel dimensions. (288x224) Whether the same results would have been obtained if all images had been kept in their original size is an issue for further investigation. In any event, the images were normalized in regards to size before I dealt with them.
Video games are generally not static - they involve lots of changing images on the screen. Whether or not the screenshot represents a typical or atypical image from that game is another are for consideration or further study. Only a single image was used from each game, and although a reasonable argument can be made that it is a typical image from the game, some games, particularly more recent ones, exhibit a much larger range of visual features at different points in the game. That is, there may be a lot more "different" visually about images from Marvel vs. Capcom, a recent fighting game, taken at different points in the game, but not nearly as much visually distinct about different screenshots from Galaga or Frogger. Whether or not this software was effective at grouping images from the same game was not explored, but is another possible area for future research. Most important to note is that the images used were not gathered scientifically for this report, they were manually chosen for another purpose: previewing a game in an emulator. It is, however, reasonable to assume this purpose probably helped yield a dataset that included images of "typical" gameplay elements in each game.
To analyze and perform queries on this dataset the program imgSeek [ imgseek.sourceforge.net ] was used. [3] The developers describe it as:
imgSeek is a photo collection manager and viewer with content-based search and many other features. The query is expressed either as a rough sketch painted by the user or as another image you supply (or an image in your collection). The searching algorithm makes use of multiresolution wavelet decomposition of the query and database images. [3]
imgSeek 0.8.3 was the version used. It was compiled from source on a computer running the Linux operating system.
After imgSeek was compiled and installed, the MAME screenshots were added. By default, imgSeek was set to ignore images under a certain size. This was changed so that all images were indexed into the database. It does, however, point to a possible limitation in that the software was developed and optimized for larger images than present in the dataset.
The images to use for search queries were not selected randomly. Seventeen games that were subjectively considered "exemplary" were chosen across genres. I specifically tried queries of games from different years, genres, and companies, but endeavored to choose games that in my view had been influential and popular. (See results section for complete list.)
The first 20 results were recorded. For the purposes of analysis, however, only the top five results returned were used. This was primarily to enable breadth - evaluating a number of searches - at the expense of some depth. In all cases, the image used to search was also the first image returned, and not considered in the analysis. Preliminary analysis indicated that in many cases the subjective quality of the top five search results was not substantively different from the overall quality of the top ten or twenty results. Regardless, all search queries and their results are available for review.
The search results were analyzed across the following metrics: color similarity, shape similarity, overall subjective visual similarity, and subjective game-genre similarity. Also noted was the year of publication of the game, a genre of the game, and the delta between that and the search image.
Color similarity is whether the result image exhibited similar color to the search image. 0 was noted for little to no color similarity, 1 for some similarity, and 2 for significant similarity.
Shape similarity attempted to measure whether there were shapes and features apparent in my subjective viewing that were similar. The same 3 point scale was used as above.
Overall visual similarity was another subjective measurement of whether the images seemed to exhibit overall visual similarity. Again, a 3 point scale was used.
While the already mentioned metrics simply measure the effectiveness of the search engine in measuring visual similarity, the last subjective metric - game-genre similarity attempted to measure if the results shared similarity in content. Specifically, in what "genre" the game is commonly classified as. A 3 point scale was used. Zero represented very little to no similarity in genre or "aboutness" of the game. A one represented some similarity, but with some important distinctions. A two represented a significant amount of similarity in aboutness or genre.
The actual search results are available, as well as all the tabulated data. Overall, the results showed some correlation in most cases between visual similarity and the game's genre. In most cases it was clear to even an untrained eye that the results were visually similar to the search query. The average results across the five queries that will be discussed were 1.78 similarity for color, 1.57 for shape, and 1.65 overall visual similarity. The average genre similarity was lower, only 1.28, but even with somewhat stringent requirements (as will be discussed) this shows that the results were overall, more similar than different.
Additional searches not analyzed extensively:
The numerical analsysis of the raw data is also available. [ results/tables.html ]
Name | Color | Shape | Visual | Genre |
Pac-Man | 2 | 1.6 | 1.8 | 1 |
Street Fighter 2 | 1.6 | 1.8 | 1.8 | 1.6 |
Outrun | 1.8 | 1.2 | 1 | 0.6 |
10 Yard Fight | 2 | 1.4 | 1.8 | 1.2 |
Asteroids | 1.5 | 1.83 | 1.83 | 2 |
Totals | 1.78 | 1.57 | 1.65 | 1.28 |
Generally regarded as the most popular arcade game of all time, Pac-Man can generally be thought of as a "maze" game. In part perhaps because of its age, it was released in 1980, and it's characteristic black background and blue maze, the results returned were not necessarily ideal. Although Pac-Man Plus was returned in the top five results, the more obvious Ms. Pac-Man and Super Pac-Man were not returned until around the 15th result. In part, it seems that the reliance on color in this query returned games that looked similar - blue lines, black backgrounds, but that were not necessarily the best matches in terms of content. Other comparable games from this time period, such as Arkanoid, Frogger and Galaga, also seemed to generate mixed results.
Published in 1991, Street Fighter 2 is considered the seminal two-person fighting game. Although not the first, it can safely be credited with bringing in a new era of similar games. Visually, the game is basically broken up into two characters on opposite sides of the screen (although they move and can be anywhere on the screen after the start of the match) and a colorful, animated background that changes completely during the game. Somewhat unexpectedly, the results for Street Fighter 2 were extremely good. Of the top five, four of the results were similar two-person fighting games. The one anomaly was a mahjong game, that apparently was included in the results because of a strip of yellow at the top of the screen that is similar to the health gauge generally found at the top of fighting games, and upon closer analysis all results in the top ten share this feature. Although it is not something one might initially think of as representing the fighting genre, it seems to have been a powerful way to group these games. Similarly good results were obtained with another popular fighting game, King of Fighters 98.
The results with Outrun, a driving game, were rather disappointing. Despite the visual similarity in the results, only one of the games returned could properly be considered a driving game in the way Outrun is.
Asteroids, from 1979, can be classified as a "shooter." More importantly, perhaps, is that it is a vector game, as opposed to the more common raster image games. Here, the results were good, as expected, and all results were relevant, perhaps since vector games have a characteristic visual quality (the vector lines and usually static or empty background.) It was a little surprising that some color vector games were returned before the black and white vector games, though.
Despite the fact that the top five results were all sports games, many of them scored a one since they were not football games. Perhaps this was too stringent a requirement, and shows that the simple three point ranking system used here may not capture enough detail. It seemed at least possible given the visual similarity of many football games - specifically the white lines on a green field with players on it - a distinction between football, soccer, and golf games might be able to be recognized. Perhaps weighing shape more heavily than color could help. The results from a soccer game, Super Sidekicks, were similar in the top five analysis, but overall seemed to have a larger proportion of specifically soccer games. However, this may simply be due to the fact that there are more soccer games in the dataset. Normalizing the dataset, or doing some statistical analysis on it is another area for future study that was outside the scope of this paper.
The correlation between visual features of video games and their actual content can be used to some degree of success to retrieve related games in content-based searches of screenshots. While the general relevance of these results was somewhat impressive, given the relatively small size of the dataset the results should be viewed with some caution. The ability for the system to immediately discern the difference between vector and raster games was expected. Its seeming inability to distinguish between different kinds of sports games was somewhat under my expectations. The most interesting results were those obtained through the fighting games. Not only were the results impressive, but the anomaly game helped elucidate an important visual feature of the games the system was noticing.