DAAO 500

DAAO 500#

This is an exploratory data analysis of collected data from DAAO. The data has also been filtered down to 500 person records with the richest data. We primarily focus on trends across gender and roles.

The DAAO data consists of…

event records with attached dates
person records for each event
each person record contains data on gender and role type, however roles are not specific to events or time.

We also extend the analysis using some AusStage data to compare role progression.

The analytical work presented on this page served as the initial exploratory data analysis for a Pursuit piece published in March 2023.

Import packages and pre-process data#

We have provided the code used to generate the DAAO 500 data, but for the sake of brevity we will not run the pre-processing code here. Instead, we will import pre-processed data from the data/analysis folder located in Github.

Show code cell source Hide code cell source

# for data mgmt
import json
import pandas as pd
import numpy as np
from collections import Counter
from datetime import datetime
import os, requests, gzip, io
import ast

# for plotting
import matplotlib.pyplot as plt
import seaborn as sns

from itables import show

import warnings
warnings.filterwarnings("ignore")

# provide folder_name which contains uncompressed data i.e., csv and jsonl files
# only need to change this if you have already donwloaded data
# otherwise data will be fetched from google drive
global folder_name
folder_name = 'data/local'

def fetch_small_data_from_github(fname):
    url = f"https://raw.githubusercontent.com/acd-engine/jupyterbook/master/data/analysis/{fname}"
    response = requests.get(url)
    rawdata = response.content.decode('utf-8')
    return pd.read_csv(io.StringIO(rawdata))

def fetch_date_suffix():
    url = f"https://raw.githubusercontent.com/acd-engine/jupyterbook/master/data/analysis/date_suffix"
    response = requests.get(url)
    rawdata = response.content.decode('utf-8')
    try: return rawdata[:12]
    except: return None

def check_if_csv_exists_in_folder(filename):
    try: return pd.read_csv(os.path.join(folder_name, filename), low_memory=False)
    except: return None

def fetch_data(filetype='csv', acdedata='organization'):
    filename = f'acde_{acdedata}_{fetch_date_suffix()}.{filetype}'

    # first check if the data exists in current directory
    data_from_path = check_if_csv_exists_in_folder(filename)
    if data_from_path is not None: return data_from_path

    urls = fetch_small_data_from_github('acde_data_gdrive_urls.csv')
    sharelink = urls[urls.data == acdedata][filetype].values[0]
    url = f'https://drive.google.com/u/0/uc?id={sharelink}&export=download&confirm=yes'

    response = requests.get(url)
    decompressed_data = gzip.decompress(response.content)
    decompressed_buffer = io.StringIO(decompressed_data.decode('utf-8'))

    try:
        if filetype == 'csv': df = pd.read_csv(decompressed_buffer, low_memory=False)
        else: df = [json.loads(jl) for jl in pd.read_json(decompressed_buffer, lines=True, orient='records')[0]]
        return pd.DataFrame(df)
    except: return None 

def fetch_top_500(fname = 'DAAO_500_data.csv'):
    # first check if the data exists in current directory
    data_from_path = check_if_csv_exists_in_folder(fname)
    if data_from_path is not None: return data_from_path

    acde_persons = fetch_data(acdedata='person') # 16s
    daao_persons = acde_persons[acde_persons['data_source'].str.contains('DAAO')].copy()
    daao_persons['ori_url2'] = daao_persons['ori_url'].apply(lambda x: eval(x))

    top500 = fetch_small_data_from_github('DAAO_500_list.csv')

    # find matches with unified data
    top500_df = pd.DataFrame()
    for i in top500['ori_url']: top500_df = pd.concat([top500_df, daao_persons[daao_persons['ori_url2'] == i]])

    # remove last column of the dataframe and return df
    return top500_df.iloc[:, :-1]

df = fetch_top_500()

Gender distribution#

We use a donut chart to explore how gender has been recorded; 55.8% of the data has been recorded as Male and 44.2% as Female.

_images/cba5713b1d4ee45926aa21c23ca70662dcffeed71d7bc1dc18643801c853537a.png

Age distribution#

Next we review the current age of people in the DAAO data. As shown in the histogram below, most of the data consists of artists born between 1945 to 1960. The youngest artists in the data are just over 30.

We also compare age of Male and Female records in the following layered histogram.

_images/ef117f36e0e80782680cab5c6b4339e741a49b3265fff3fb06e34d46c451165b.png

_images/5e623abe064e8e0287cf29140193fe5215a9ed440eb505fb136b201eb5e88120.png

Lifespan distribution#

As we have birth and death records for a subset of the data, we can assess the average lifespan according to DAAO records. On average, artists live to approx. 80 years old. However there are some cases where we have artists not surpassing the age of 40.

We also compare lifespan of Male and Female records in the following layered histogram. As expected, female artists live longer than male artists.

_images/1711bb95a9245777738ee2420c9fe35b72688659bb75c45b3fbb88b59438002b.png

_images/7ad08b84d26de56860e4ee7c665adfa5109091ad21ef6bdc001a46b0b078d619.png

Birthplace#

Below we highlight the top 10 values for birthplace for DAAO records. A majority of birthplace data is missing, and other values require more cleansing.

_images/e2ce3ace5ae15ad8b62d5eea983fec8773747d163db35c624c03e7e04ce3f719.png

Roles, most frequently occurring#

Next, we take a deep dive into the recorded roles across the DAAO data. A person can have many roles, and this is usually the case in the DAAO artist records.

We first review the distribution across role types by treating each role for a given person as one record. Painter has the most records making up 35% of the data. This is followed by Cartoonist/Illustrator and Printmaker.

We begin to see some interesting differences in frequency when we segregate the data by Male and Female records.

Painter remains the most frequently occuring role across both Male and Female.
Cartoonist/Illustrator appears to be mostly Male records.
Textile Artist/Fashion Designer appears to be mostly Female records.

_images/4ee6c2ed94d169f610e28cab5a288a6e3cc7d69696bf6b2bda4096e2f575e925.png

_images/70541ea3f40e05619b3da8c1a013d1ef38e6030e4a9e22df92cd1a2560fd7089.png

Number of roles#

We explore whether on average certain roles occur standalone or as an addition to other roles. The visualisaiton below highlights the median number of roles given a role type. We list our main findings.

On average, Furtiture Designers/Cabinetmakers tend to not have any additional roles.
A majority of role types (Painter,Printmaker,Cartoonist/Illustrator, etc.) appear to frequently occur with one or more additional roles.
Installation Artists, Performance Artists, Sound Artists, Digital Artists/Designers, Theatre/Film Designers and Video Artists on average have three additional roles.

We also tested averaged differences for data filtered on Males and Females, and found some interesting differences. Our findings are summarised below.

On average, female Furtiture Designers/Cabinetmakers hold two more roles than male Furtiture Designers/Cabinetmakers.
On average, female Carvers hold 1.5 more roles than male Carvers.
On average, male Painters hold 1 more role than female Painters.
On average, male Mixed Media Artists hold 1 less role than female Mixed Media Artists.

_images/7baf023dece6a97938639eedc07d34fcbbca702fcfcaf13e6c758c8c01dc7a8b.png

_images/5090c1d1828b72c6ebfbab8991fb3003dcf567fef3bb89d7706b3e071b9ccf68.png

Association rule mining, Roles#

Now we assess the occurrences of multiple roles happening together. We adopt association rule mining as it can help us answer questions such as what’s the likelihood of a Painter also being a Printmaker.

We generate association rules for Male data and Female data, and compare the confidence of the rules to highlight any significant differences. Below are rules with a difference larger than 10%.

Males who are listed as Installation Artists and Mixed Media Artists are 34.7% more likely than Females to also be listed as a Sculptor.
Males who are listed as Graphic Designers are 34.4% more likely than Females to also be listed as a Cartoonist/Illustrator.
Males who are listed as Installation Artists and Painters are 20.4% more likely than Females to also be listed as a Sculptor.
Males who are listed as Video Artists are 11.8% more likely than Females to also be listed as an Installation Artist.
Females who are listed as Draughtsman and Mixed Media Artists are 15.9% more likely than Males to also be listed as a Painter.
Females who are listed as Printmaker and Sculptor are 11.4% more likely than Males to also be listed as a Painter.

We can visualise these association rules along with other rules graphically whereby nodes represent co-occurences (or rules) and the directions of the arrows (edges) indicate whether certain roles are antecendents or consequents of a certain rule. The size of nodes correspond to the support of the rule, and the colour transparency represents the confidence of the rule. Brief definitions are provided below.

The positioning of roles in the graphs below is optimised to facilitate the rules, and therefore can be interpreted as a measure of closeness. For example, in the Males visualisation we can infer that Painter and Printmaker are somewhat related through co-occurences with other roles and/or rules consisting of both roles.

Note

Antecedents: Antecedents are the items or conditions that are present in a given itemset (i.e., roles) in an association rule. They are also known as the “left-hand side” or “premise” of the rule.
Consequents: Consequents are the items or conditions that are predicted to occur in a given itemset based on the presence of the antecedents in the association rule. They are also known as the “right-hand side” or “conclusion” of the rule.
Support: Support is a measure of how frequently an association rule occurs in the dataset. It is calculated as the percentage of itemsets that contain both the antecedents and the consequents of the rule.
Confidence: Confidence is a measure of how strongly the presence of the antecedents in an itemset predicts the presence of the consequents. It is calculated as the percentage of itemsets that contain the antecedents and the consequents, divided by the percentage of itemsets that contain the antecedents.

_images/0384d0c747c801f1a2adfef38747a3972e623c8fb421c62cc8f65f839ef30bf7.png

_images/2dd43066231d69342fdc18804833f3300d32177f46712dc7a2d5dccd21202844.png

Exhibitions, Artist Analysis#

As we have event records for a subset of the DAAO artists/designers, we inspect the people with the most participation records, specifically for exhibitions. Below we highlight the top 10 artists. Gwyn Hanssen Pigott leads having been involved in 241 exhibitions, she is followed by Mike Parr and Patricia Joy Roggenkamp.

Orange bars signify that the artist has won either an Archibald, Wynne or Sulman prize.

_images/40d958c04bba1d6f9f6f77957fea66630cc4a192b098776e0ceda55845cb0b4b.png

Exhibition partipication over time, Males and Females#

The two visualisations below show exhibition participation rates for Males and Females over time (aggregated into decades). The former compares Males and Females by frequency and the latter compares by proportion. It is evident that Males tend to participate in more exhibitions than Females, however the proportional gap has decreased in recent decades.

It should be noted that an older version of the first visualisation was used in the aforementioned Pursuit article. The below visualisation is an updated version using data from a recent extraction.

Show code cell source Hide code cell source

exhibition_data['date_range.date_start.year'] = exhibition_data['date_range.date_start.year'].astype(int)
exhibition_data['start_year_decade'] = [ int(np.floor(int(year)/10) * 10) 
                                        for year in np.array(exhibition_data['date_range.date_start.year'])]

# gender frequency over decade
exhibition_data = exhibition_data[(exhibition_data['start_year_decade'] > 1939) &\
                                  (exhibition_data['start_year_decade'] < 2001)]

# males
events_males_tab = exhibition_data[exhibition_data.gender=='male']['start_year_decade']\
.value_counts()\
.reset_index()\
.sort_values('index')

events_males_tab['gender'] = 'Male'

# females
events_females_tab = exhibition_data[exhibition_data.gender=='female']['start_year_decade']\
.value_counts()\
.reset_index()\
.sort_values('index')

events_females_tab['gender'] = 'Female'

# plot
fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(events_males_tab['index'], events_males_tab['start_year_decade'], label="Males", marker='o')
plt.plot(events_females_tab['index'], events_females_tab['start_year_decade'], label="Females", marker='o')
plt.title('DAAO exhibition participation records,\nMales and Females, Decade')
plt.xticks(range(1940, 2010, 10), 
        ['1940s', '1950s', '1960s','1970s', '1980s', '1990s','2000s'])
plt.grid(axis='x')
plt.ylim([0, 1100])
ax.legend(loc="upper right", ncol=2)
plt.show()


# gender proportion over decade
fig, ax = plt.subplots(figsize=(10, 6))
ff = pd.DataFrame(pd.crosstab(exhibition_data['start_year_decade'],
                exhibition_data['gender'],normalize='index')['female']).reset_index()
mm = pd.DataFrame(pd.crosstab(exhibition_data['start_year_decade'],
                exhibition_data['gender'],normalize='index')['male']).reset_index()

plt.plot(mm['start_year_decade'], 
        mm['male'], 
        label="Males", marker='o')
plt.plot(ff['start_year_decade'], 
        ff['female'], 
        label="Females", marker='o')

# adjust legend
ax.legend(loc="upper right", ncol=2)

plt.xlabel('Decade')
plt.ylim([-0.1, 1.13])
plt.grid(axis='x')
plt.xticks(range(1940, 2010, 10), 
        ['1940s', '1950s', '1960s','1970s', '1980s', '1990s','2000s'])
plt.title('Proportion of DAAO exhibition participation records,\nMales and Females, Decade')
plt.show()

_images/13072cff3f4e1f094832bb5b64ef5c52a5405236cbeaa679d9923d3babceb2bd.png

_images/a31e3b3d1d3c3d7c6204829c5e274cad2f92ffac52d7bcc31658f089db121b22.png

Frequency of DAAO records with exhibition data#

We review the most frequent roles attached to exhibition participants. Note that these only consider people with exhibition records that consist of a date.

	Role	Frequency
12	Painter	3012
17	Sculptor	1772
15	Printmaker	1548
9	Installation Artist	1205
5	Draughtsman	1128
14	Photographer	1096
11	Mixed Media Artist	998
13	Performance Artist	570
20	Video Artist	460
3	Ceramist	413
16	Screen Artist	375
18	Textile Artist / Fashion Designer	240
4	Digital Artist/Designer	239
1	Cartoonist / Illustrator	199
19	Theatre / Film Designer	167
8	Industrial / Product Designer	140
21	Weaver	125
7	Graphic Designer	115
2	Carver	110
0	Architect / Interior Architect / Landscape Arc...	105
10	Jewellery Designer	88
6	Glass & metal Artist / Designer	57

Show code cell source Hide code cell source

def drilldown_by_role(role='Painter', data=None):
    byrole = data[data['roles_expanded'].str.contains(role)]

    # males
    events_males_tab = byrole[byrole.gender=='male']['start_year_decade']\
    .value_counts()\
    .reset_index()\
    .sort_values('index')

    events_males_tab['gender'] = 'Male'

    # females
    events_females_tab = byrole[byrole.gender=='female']['start_year_decade']\
    .value_counts()\
    .reset_index()\
    .sort_values('index')

    events_females_tab['gender'] = 'Female'

    # gender frequency over decade
    fig, ax = plt.subplots(figsize=(10, 6))
    plt.plot(events_males_tab['index'], 
             events_males_tab['start_year_decade'], 
             label="Males", marker='o')
    plt.plot(events_females_tab['index'], 
             events_females_tab['start_year_decade'], 
             label="Females", marker='o')
    plt.xticks(range(1940, 2010, 10), 
           ['1940s', '1950s', '1960s','1970s', '1980s', '1990s','2000s'])
    plt.grid(axis='x')
    if events_males_tab['start_year_decade'].max() > events_females_tab['start_year_decade'].max():
        plt.ylim(0,events_males_tab['start_year_decade'].max()*1.2)
    else: plt.ylim(0,events_females_tab['start_year_decade'].max()*1.2)
    plt.title(f'{role} participation in DAAO exhibition records,\nMales and Females, Decade')
    ax.legend(loc="upper right", ncol=2)
    plt.show()

    # gender proportion over decade
    fig, ax = plt.subplots(figsize=(10, 6))
    ff = pd.DataFrame(pd.crosstab(byrole['start_year_decade'],
                    byrole['gender'],normalize='index')['female']).reset_index()
    mm = pd.DataFrame(pd.crosstab(byrole['start_year_decade'],
                    byrole['gender'],normalize='index')['male']).reset_index()

    plt.plot(mm['start_year_decade'], 
             mm['male'], 
             label="Males", marker='o')
    plt.plot(ff['start_year_decade'], 
             ff['female'], 
             label="Females", marker='o')

    ax.legend(loc="upper right", ncol=2)
    
    plt.xlabel('Decade')
    plt.ylim([-0.1, 1.13])
    plt.grid(axis='x')
    plt.xticks(range(1940, 2010, 10), 
       ['1940s', '1950s', '1960s','1970s', '1980s', '1990s','2000s'])

    plt.title(f'{role} participation proportion in DAAO exhibition records,\nMales and Females, Decade')
    plt.show()

Drilldown into roles#

We now will consider the same visualisations as above however drill down on certain roles. This will allow us to identify any role-specific trends across gender and time. We inspect the top ten roles with the largest frequency in the DAAO data.

1. Painter#

Male painters participate in more exhibitions than female painters.

_images/d6c0905a081282ad3ef359a26c3866df3e94ad3537b25fc24ba3d043367d5c52.png

_images/60da7fe1d2890970daf97c5e398e661ed33c979b91bd2b91c3e479a7cdc1dadd.png

2. Cartoonist / Illustrator#

It should be noted that this role is the second highest entry in DAAO. However there are very few exhibition records for cartoonists and illustrators.

Male cartoonists/illustrators participate in more exhibitions than female cartoonists/illustrators. Across most decades, this is an 80-20 proportion.

_images/f3d5599a7ad2c1b9c65337b5160ae39640e7cfcc525ff481ab7e5b3db1a3a533.png

_images/0c044dcff7a9c5d59b1572fa41f0704aeba50a7f19fc324beb88dae66ee266f9.png

3. Printmaker#

Historically, male printmakers participate in more exhibitions than female printmakers, however the proportional gap has decreased in recent decades.

_images/eb3b18031f57b0cd7f38e4d6d1f701b79d6d0bb6e17fed06cbacb4a81f92b001.png

_images/ac69060f35e876db989fff99f9ad4bce701943cd52e16bc334dd85cf25a56258.png

4. Sculptor#

Male sculptors participate in more exhibitions than female sculptors.

_images/99600e74bb8cd9b55f739d5293127e81d57a12cbf9b19a0523cd956e8134fd92.png

_images/8ca28a21886e1a958b91e4a3c16b9a1881cf483368d64a33d6dc197ff844d342.png

5. Draughtsman#

Male draughtsman participate in more exhibitions than female draughtsman.

_images/a5fcc7e2707c0d122eb2dff3ea6cdb8bf6d7dc072e8f0ed49c34d60c8be70fd8.png

_images/fc7aafdddac72643597c57b25f0a52f7f84dc17a554852ea96f828a5b0996efa.png

6. Photographer#

Historically, male photographers participated in more exhibitions than female photographers, however in recent decades this has shifted.

_images/8ee1cab59187e7938c72c03e21895cbe05380879b176d9d4e54702d531b9481a.png

_images/cbe8bcf61dc4bd4e00d754e04867b2b78704132fac4439aea7821ba80ec4789c.png

7. Mixed Media Artist#

Male mixed media artists participate in more exhibitions than female mixed media artists.

_images/51eaf98825108cef7dae24f9e41fa49f0995012ced1593e3999b310c78dce36b.png

_images/2c559af79aa185f18971eef6800670ee4978b05e4d345f6dc043a66b4a44bc4e.png

8. Installation Artist#

Male installation artists participate in more exhibitions than female installation artists, however the proportional gap has decreased in recent decades. DAAO contains no exhibition records for installation artists pre-1950s

_images/769b3055b19111bd6f23496d19659f2f31a1000eac991e11e9584145413419a3.png

_images/3591cbb400a860ec360b3bc97361cf2e7c14137d57ad9c2e3553a1774594b0ea.png

9. Ceramist#

Female ceramists participated in more exhibitions than male ceramists. A significant peak can be seen in the 1990s.

_images/97ff52a799cb3985d70949d8fc928d3ae2075fcd870723db0b0f004cc3bba687.png

_images/5d5629a2a9b34fd16a346ac535ef93387d5d8009b9470da1fd504d8a5575d738.png

10. Textile Artist / Fashion Designer#

Historically, female textile artist/fashion designers participated in more exhibitions than male textile artist/fashion designers, however in recent decades the proportional gap has stabilised.

_images/43daaabb0be6d1ef2ed61d824e285ddba5beec510c0c6c0492562fb371d4e71d.png

_images/837811186cade0a00ca282b83531df30a137bccc82df952c654ab2816acffdfa.png

Other roles not in top 10#

We noticed that the initial top 10 roles frequently occurring in the DAAO does not correspond with the top roles across exhibition participants, therefore we also inspect roles with high frequency among exhibition participants.

Male performance artists participated in more exhibitions than female performance artists. Data begins post-1950s.

_images/444401b0b8687d7fae570d23ce036e5d0399a408bfb4e416d8e545201c2e7cd8.png

_images/d0180674d1b40f8026c1951074504a8819dc56b2c5fe4cbdb8dc62622dc4a107.png

Female video artists participated in more exhibitions than male video artists, however the proportional gap has decreased in recent decades. Data begins post-1960s.

_images/e9420918e05ec856317e5b372e4873404c9565014ff0865aea04fb4c497770f6.png

_images/800f4a7172040e61fc77a4503b18cb0b7fea9d32692042898b20ed6506d64505.png

Male screen artists participated in more exhibitions than female screen artists. Data begins post-1960s. There are very little records for female screen artists.

_images/4e10c5d085b453c4588a48fdd510773c51a5a6ee7e7446ad2e77b63b5fe74520.png

_images/4b81dc6dd5010dbafbc766334606afee1f08b66e0b56e414a3b9efe22ae7d51a.png

Male digital artists/designers participated in more exhibitions than female digital artists/designers, however the proportional gap has decreased in recent decades. Data begins post-1960s.

_images/e4eb043ec961ecf1f8e19390c89be57c2f3c10317c81b096fd70bbec9fcda167.png

_images/6a443266bb77d7fcd92a6800e87f88991343e2f1c1846b055c3ce3344fa46b44.png

AusStage Roles#

We extend our analysis on roles by looking at roles within AusStage data, specifically actors and directors. Overall, the AusStage data consists of over 177k Male and Female records corresponding to approximately 106k event records.

Actor to Director#

Our main focus is to evaluate the time it takes actors to progress into directors. We hypothesise that the duration will differ for male and female trajectories. To assess this, we first look at some high-level summary statistics to further understand the data. These are listed below as question-answers, and are also visualised in a bar chart.

We are particularly interested in event records of actors who have been listed as a director in a subsequent event. This way we can estimate the time (in years) it took for an actor to make their directorial debut.

When we filter the AusStage data down, we found that there are 1,684 female actors who subsequently became directors and 2,945 male actors who subsequently became directors. These actor-turned-directors make up approximately 75% of all directors in the data for each respective gender.

It should be noted that the annotated numbers in the second visualisation in the aforementioned Pursuit article were generated using a early version of the data. The numbers have since changed due to a recent extraction and hence why the annotated numbers in the visualisation below slightly differ.

Show code cell source Hide code cell source

def get_stats(surpassyear=None, gender='Female', printstuff=False, addfilter=None):
    females = ausstage_persons[(ausstage_persons['gender'].str.contains(gender))]    
    females = females[~females['career'].isnull()]
    females['director_exists'] = [1 if '"title": "Director"' in r else 0 for r in females['career']]
    actor_cond = females['career'].str.contains('"title": "Actor"', na=False)
    females_actors = females[actor_cond]
    females_directors = females[females['director_exists'] == 1]
    females_actors_directors = females[actor_cond & (females['director_exists'] == 1)]
    
    if printstuff:
        print(f"How many {gender}s?", females.shape[0])
        print(f"How many {gender}s are listed as actors? {females_actors.shape[0]}",
              f"({round((females_actors.shape[0]/females.shape[0])*100,2)}%)")
        print(f"How many {gender}s are listed as directors? {females_directors.shape[0]}",
              f"({round((females_directors.shape[0]/females.shape[0])*100,2)}%)")
        print(f"How many {gender}s are listed as actors & directors? {females_actors_directors.shape[0]}",
              f"({round((females_actors_directors.shape[0]/females.shape[0])*100,2)}%)")
    
    relevant_roles = merged[(merged['occupation.title'] == "Actor") | (merged['occupation.title'] == "Director")]
    # other_roles = []
    # director_roles = ['Assistant to the Director','Assistant Director','Associate Director']

    relevant_females = dict()
    for female_id in females_actors_directors.ori_id.unique():
        this_female = relevant_roles[(relevant_roles['ori_id'] == ast.literal_eval(female_id))]\
                        .sort_values('coverage_range.date_range.date_start.year')
        
        if surpassyear is not None:
            if this_female.iloc[0]['start_year_decade'] != surpassyear: continue
        
        first_actor_role = this_female[this_female['occupation.title']=='Actor']['coverage_range.date_range.date_start.year']\
                                    .iloc[0]

        try:
            first_director_role = this_female[this_female['occupation.title']=='Director']['coverage_range.date_range.date_start.year']\
                                        .iloc[0]
        except:
    #         this_roles = list(this_female['role'].unique())
    #         other_roles.extend(this_roles)
            continue

        no_events = this_female[(this_female['coverage_range.date_range.date_start.year'] > first_actor_role) & 
                                (this_female['coverage_range.date_range.date_start.year'] < first_director_role)]

        relevant_females[female_id] = [first_actor_role,first_director_role,
                                       no_events.shape[0]]
    if printstuff is False:
        if len(relevant_females) == 0: return [surpassyear,None,None,None]
    
    relevant_females_df = pd.DataFrame(relevant_females).T
    relevant_females_df.columns = ['Actor_Yr','Director_Yr','NoEvents']
    relevant_females_df['ActorDirectorYrDiff'] = relevant_females_df['Director_Yr'] - relevant_females_df['Actor_Yr']

    directors_first = relevant_females_df[relevant_females_df['ActorDirectorYrDiff'] <= 0]
    actors_first = relevant_females_df[relevant_females_df['ActorDirectorYrDiff'] > 0]

    if printstuff:
        print(f'\nOut of these {females_actors_directors.shape[0]}, how many started their careers as directors? {directors_first.shape[0]}',
              f'({round((directors_first.shape[0]/relevant_females_df.shape[0])*100,2)}%)')

        print(f'Out of these {females_actors_directors.shape[0]}, how many started their careers as actors and progressed into directors? {actors_first.shape[0]}',
              f'({round((actors_first.shape[0]/relevant_females_df.shape[0])*100,2)}%)\n')

    actors_first_clean = actors_first[actors_first['ActorDirectorYrDiff'] <= 80]
    actors_first_clean['NoEvents'] += 1

    if addfilter is None:
        return [surpassyear,
                actors_first_clean.shape[0],
                actors_first_clean['NoEvents'].median(),
                actors_first_clean['ActorDirectorYrDiff'].median()]
    else:
        return [surpassyear,
                actors_first_clean[actors_first_clean.NoEvents > addfilter].shape[0],
                actors_first_clean[actors_first_clean.NoEvents > addfilter]['NoEvents'].median(),
                actors_first_clean[actors_first_clean.NoEvents > addfilter]['ActorDirectorYrDiff'].median()]

    # print(actors_first_clean.describe().T.tail(2))

_ = get_stats(printstuff=True)
_ = get_stats(printstuff=True, gender='Male')

# Data to plot
x = ['Female\n(Both)',
     'Male\n(Both)',
     'Female\nDirectors',
     'Male\nDirectors',
     'Female\nActors',
     'Male\nActors']

heights = [2256, 3907, 4538, 7333, 42450, 45299]

fig, ax = plt.subplots(figsize=(10, 6))

# Create the bar chart
plt.bar(x, heights, color='tab:orange')
plt.ylim(0,60000)
# plt.yticks([])

# Create the bar chart
plt.bar(x[1], heights[1], color='tab:blue')
plt.bar(x[3], heights[3], color='tab:blue')
plt.bar(x[5], heights[5], color='tab:blue')

plt.text(0, 9500, '2,256\n', 
         ha='center', va='center',size=16)
plt.text(0, 8000, '\n\n (2.81%\nof all female \nrecords)', 
         ha='center', va='center',size=10, alpha=0.75)

plt.text(1, 10500, '3,907\n', 
         ha='center', va='center',size=16)
plt.text(1, 9000, '\n\n (4.22%\nof all male \nrecords)', 
         ha='center', va='center',size=10, alpha=0.75)

plt.text(2, 12000, '4,538\n', 
         ha='center', va='center',size=16)
plt.text(2, 10500, '\n\n (5.66%\nof all female \nrecords)', 
         ha='center', va='center',size=10, alpha=0.75)
         
plt.text(3, 14000, '7,333\n', 
         ha='center', va='center',size=16)
plt.text(3, 12500, '\n\n (7.92%\nof all male \nrecords)', 
         ha='center', va='center',size=10, alpha=0.75)

plt.text(4, 50000, '42,450\n', 
         ha='center', va='center',size=16)
plt.text(4, 48500, '\n\n (52.96%\nof all female \nrecords)', 
         ha='center', va='center',size=10, alpha=0.75)
         
plt.text(5, 52500, '45,299\n', 
         ha='center', va='center',size=16)
plt.text(5, 51000, '\n\n (48.92%\nof all male \nrecords)', 
         ha='center', va='center',size=10, alpha=0.75)


plt.title('Actor and director frequency in AusStage records')

# Show the plot
plt.show()

How many Females? 80155
How many Females are listed as actors? 42450 (52.96%)
How many Females are listed as directors? 4538 (5.66%)
How many Females are listed as actors & directors? 2256 (2.81%)

Out of these 2256, how many started their careers as directors? 572 (25.35%)
Out of these 2256, how many started their careers as actors and progressed into directors? 1684 (74.65%)

How many Males? 92589
How many Males are listed as actors? 45299 (48.92%)
How many Males are listed as directors? 7333 (7.92%)
How many Males are listed as actors & directors? 3907 (4.22%)

Out of these 3907, how many started their careers as directors? 962 (24.62%)
Out of these 3907, how many started their careers as actors and progressed into directors? 2945 (75.38%)

_images/ed2d7d961ca2e12aeaa42280dd2deb1f5ad56c43de15e84426bda5f5242994db.png

We visualised these actor-turned-director records over time (aggregated into 25-year bins). Each point in the graph marks the median number of years it took for an actor to make their directorial debut (orange for female averages and blue for male averages). We also annotate the male-female median difference in years for each 25-year periodm, along with a ratio of male to female actor-turned-directors as shown in brackets. The latter annotation highlights the disparity and sparsity of female records in earlier periods.

In 1950-1975, it took four and a half years longer for female actors (15.5 years on average) to progress into directors than male actors (11 years on average).
The male-female median difference gradually decreases over time. In the most recent period, it takes male and female actors five and half years to progress from actor to director.

Show code cell source Hide code cell source

decades = list(merged['start_year_decade'].unique())
decades.sort()

female_medians = []
male_medians = []

for x in decades[-5:]:
    female_medians.append(get_stats(surpassyear = x, addfilter=1))
    male_medians.append(get_stats(surpassyear = x, gender='Male', addfilter=1))

avg_progression_time = pd.merge((pd.DataFrame(male_medians)).iloc[:,[0,3]], 
                                (pd.DataFrame(female_medians)).iloc[:,[0,3]],
                                on=0)
avg_progression_time.columns = ['Year','Males','Females']

# avg_progression_time = avg_progression_time.tail(-1)

# gender frequency over decade
fig, ax = plt.subplots(figsize=(10, 6))

plt.plot(avg_progression_time['Year'], 
         avg_progression_time['Males'], 
         label="Male Avg", marker='o')

plt.plot(avg_progression_time['Year'], 
         avg_progression_time['Females'], 
         label="Female Avg", marker='o')

plt.fill_between(avg_progression_time['Year'], 
                 avg_progression_time['Males'],
                 avg_progression_time['Females'],
                 color='tab:orange',alpha=.05)

# plt.text(1872.5, 13.4, '+5 yrs', ha='left', va='center',size=14, alpha = 0.8)
plt.text(1897.5, 14, '+2 yrs', ha='left', va='center',size=14, alpha = 0.8)
plt.text(1897.5, 12, '(4.2)', ha='left', va='center',size=11, alpha = 0.8)

plt.text(1922.5, 13, '+2 yrs', ha='left', va='center',size=14, alpha = 0.8)
plt.text(1922.5, 11, '(2.8)', ha='left', va='center',size=11, alpha = 0.8)

plt.text(1947.5, 13, '+4.5 yrs', ha='left', va='center',size=14, alpha = 0.8)
plt.text(1947.5, 10, '(2.5)', ha='left', va='center',size=11, alpha = 0.8)

plt.text(1972.5, 12, '+1 yrs', ha='left', va='center',size=14, alpha = 0.8)
plt.text(1972.5, 9, '(1.5)', ha='left', va='center',size=11, alpha = 0.8)

plt.text(1995.5, 7.5, '+1 yrs', ha='left', va='center',size=14, alpha = 0.8)
plt.text(1997.5, 4, '(1.2)', ha='left', va='center',size=11, alpha = 0.8)

# add an box in top left
plt.text(1897.5, 18, 'Values in brackets denote\nthe male-to-female count ratio\nfor each respective period.', 
         ha='left', va='center',size=11, alpha = 0.8) 

plt.grid(axis='x')
plt.xticks(range(1900, 2010, 25), range(1900, 2010, 25))
plt.ylim(1890,2010)
plt.xticks(range(1900, 2010, 25), 
           ['1900-1925', '1925-1950',
            '1950-1975', '1975-2000', '2000-Present'])
plt.ylim(0,20)
ax.legend().set_visible(True)

plt.title('Average number of years for an actor to reach their directorial debut, \nMales and Females, 25-year median', 
          size=12)

plt.show()

_images/784b9512a9ae7f7426be49c5fe5dddb61f5860277155aaf517b3f8e77b201341.png

Female actors have historically needed to wait longer to progress into directors. Currently, the average time for an actor to reach their directorial debut is five to six years.

Exhibitions, Spatial Analysis#

A majority of the DAAO exhibition data consists of geocodes allowing us to visualise the location of exhibitions on a map. We illustrate four snapshots of DAAO exhibitions over time (1930, 1960, 1990 and 2020).

_images/8f4ad4103c075b45c407cf032f5d243d3acc8406cf284710b9f13bd01c16eeb1.png

_images/5810e2ed5c2d22ce43da472f983050105c2dce2a36d69f0a517ec3d03d016bcf.png

_images/4954942c8776e52713297db6e5845e0575f127048eaa07ef666851324d2a8e3a.png

_images/d29f7ddc4ec1873050f45a29b570ff71e71843039ec42c36774189b78c5427f9.png

DAAO 500

Contents

DAAO 500#

Import packages and pre-process data#

Gender distribution#

Age distribution#

Lifespan distribution#

Birthplace#

Roles, most frequently occurring#

Number of roles#

Association rule mining, Roles#

Exhibitions, Artist Analysis#

Exhibition partipication over time, Males and Females#

Frequency of DAAO records with exhibition data#

Drilldown into roles#

1. Painter#

2. Cartoonist / Illustrator#

3. Printmaker#

4. Sculptor#

5. Draughtsman#

6. Photographer#

7. Mixed Media Artist#

8. Installation Artist#

9. Ceramist#

10. Textile Artist / Fashion Designer#

Other roles not in top 10#

AusStage Roles#

Actor to Director#

Exhibitions, Spatial Analysis#