There are more than 1.5 million projects today on the Scratch website containing more than 45 million images and sounds. Here is the list of the 10 most common images used. The number in parenthesis represents the number of times that image is used.
Methodology
Each Scratch project can have one or more versions uploaded to the website (only the latest one is visible to the public). Each project version has some sprites and a stage. Each sprite and each stage can have one or more images. Whenever a project gets uploaded, it gets analyzed and the attributes of its different components (blocks, images, sounds, etc) get stored in a structured database. In that database we have a table with the metadata about each image and sound, such as its name and size. I used that table to find out what are the most common images based on their name.
In order to keep things simple and since most projects have only one version, I decided to ignore version 2 and higher. Then I grouped the images by name, assuming that two images with the same name (e.g. shark1-b) were the same. Of course, this assumption is not always correct. For example, it means that if someone imported an image called awesome-cat.png (Scratch would give it the name "awesome-cat" once imported) and then edited it, it would get counted along with an unedited version of the same image. Another challenge is that whenever people use the Scratch paint editor, images get assigned default sequential names (e.g. costume8, costume9, etc). So names like costume1and background1, along with their equivalents in other languages (e.g. disfraz1 in Spanish), are unlikley to represent the same image. For this reason, I ignored all those non-unique image names from the list of the top 1000 most common image names . That included names like "normal" or "1" which after a manual analysis of a small sample proved to also represent a wide variety of images. In the future, a more accurate analysis would involve parsing all the projects and generate a hash of the binary representation of each image.
- button (58,072) Image may be NSFW.
Clik here to view. - cat1-a (35,748) Image may be NSFW.
Clik here to view. - bananas1 (33,880) Image may be NSFW.
Clik here to view. - underwater (26,646)Image may be NSFW.
Clik here to view. - beachball1 (25,694)Image may be NSFW.
Clik here to view. - spotlight-stage (22,924) Image may be NSFW.
Clik here to view. - bat1-a (21,220)Image may be NSFW.
Clik here to view. - buttonPressed (20,973) Image may be NSFW.
Clik here to view. - gobo1 (20,176) Image may be NSFW.
Clik here to view. - shark1-b (19,368) Image may be NSFW.
Clik here to view.
Methodology
Each Scratch project can have one or more versions uploaded to the website (only the latest one is visible to the public). Each project version has some sprites and a stage. Each sprite and each stage can have one or more images. Whenever a project gets uploaded, it gets analyzed and the attributes of its different components (blocks, images, sounds, etc) get stored in a structured database. In that database we have a table with the metadata about each image and sound, such as its name and size. I used that table to find out what are the most common images based on their name.
In order to keep things simple and since most projects have only one version, I decided to ignore version 2 and higher. Then I grouped the images by name, assuming that two images with the same name (e.g. shark1-b) were the same. Of course, this assumption is not always correct. For example, it means that if someone imported an image called awesome-cat.png (Scratch would give it the name "awesome-cat" once imported) and then edited it, it would get counted along with an unedited version of the same image. Another challenge is that whenever people use the Scratch paint editor, images get assigned default sequential names (e.g. costume8, costume9, etc). So names like costume1and background1, along with their equivalents in other languages (e.g. disfraz1 in Spanish), are unlikley to represent the same image. For this reason, I ignored all those non-unique image names from the list of the top 1000 most common image names . That included names like "normal" or "1" which after a manual analysis of a small sample proved to also represent a wide variety of images. In the future, a more accurate analysis would involve parsing all the projects and generate a hash of the binary representation of each image.
It is not surprising that all of the images in this list are images that come with Scratch itself. However, they represent less than one percent (1%). The reality is that the the distribution of images follows a distribution with a long tail where there are a lot of images that get used once or twice.