Blog Post #6: Carla Ebel

27.07.2022

RELEVEN Gender Model – Can our data model be more than binary? After presenting our approach to model gender as an event-based assignment at DH2022 Tokyo, Carla describes the process and model in this blog post.

I’m Carla, a data analyst from the ACDH-CH and member of the RELEVEN team since summer 2021. In the RELEVEN project we engage intensively and in a constant exchange of ideas with the core question of data modeling:

How do we connect our current state of knowledge with data generated in the past, making it as transparent as possible while introducing as little of our own bias as possible? We have to reflect on our complex historically built knowledge systems and our own logic and put them into a binary order to process them with computers and then interpret and offer our data and knowledge to others for interpretation.

A good example for this conflict is the modelling of data about gender assertions. The first dataset that we integrate in our database is from the Prosopography of the Byzantine World. It contains three gender terms/categories: eunuch, male and female. In data modelling we have “controlled vocabularies” for this type of information.

For our data model we use the latest RDFS version of the CIDOC Conceptual Reference Model. CIDOC CRM is one of the main data modelling tools in the humanities and has been developed since September 2000.

Although modelling gender is not addressed in the CIDOC CRM 7.1.1 documentation that we used, the application of "P2 has type" and "E55 Type" is the default solution, which treats gender as an untemporalized type that is attached to a person. This model can be found in other euro-centric and CIDOC CRM related resources like the Swiss Art Research documentation. We find the reason for this in the proposal about the decision of the CIDOC CRM SIG working group 1 about deleting the class and property for gender in July 2001:

The entity Gender is not needed, as it can completely be covered by 
E21 Person. (P2) has type: E55 Type 
and there is nothing more important about gender than about any other properties giving rise to a set of people. Delete E76 Gender, P61 has gender. 
In scope. Decision to be connected to general handling of types, issue 50
(https://www.cidoc-crm.org/Issue/ID-38-delete-gender).

In line with the wider scholarship on gender in the Byzantine era, it has long been acknowledged that, not only within the Byzantine society shaped by a Christian discourse but also in the regions with Islamic influence and elsewhere in Asia, there were persons, in our case classified as eunuchs, who were indeed considered a third gender (Ringrose, 2003) and they are represented as such in existing databases such as the Prosopography of the Byzantine World.

Obviously, we can't model our gender data with the current CIDOC CRM 7.1.1. solution, because gender is not a static attribute in our dataset. We depend on our own perspectives and the concepts and assertions we can derive and interpret from sources provided by others, written in the contexts of their own lives. This circumstance is known as the standpoint theory and was developed by researchers like the sociology professor Patricia Hill Collins. Inherent in standpoint theory is that we can never understand it independently, but only in communication and empathy with others when we try to truly comprehend the reality of our counterpart and acknowledge at the same time that we can’t. Thus, in the development process of our model we discussed other possible use cases, but would leave more general proposals to the actors as specialists (for currently living people) and research experts who work with the standpoint theory (for historic data). I see a responsibility in classifying people and we are open and thankful for any critique of our work/model because we are aware of our limited knowledge.

 CIDOC CRM is event-based. That means we can break down everything to the model of an event, which is defined through time, space, place and participating actors and objects.

Additionally, we have the class “E28 Conceptual Object”. That is paradoxical, because our concept of time and space and objects and actors and humans are also conceptual objects in our field of knowledge production. The difference between time and space and other conceptual objects is, that we assume that we can quantify them in a way that makes them comparable at least on planet earth. The atomic clock was invented in 1946 and aerial photography came up only in the 19th century and was first practiced by French, British and American persons (according to Wikipedia) in a colonial situation. So these concepts are clearly developed and shaped by power structures.

Clocks and maps (which are generated with the help of photograpy from space) are decisive for our understanding of data in digital humanities and in general when we think of coordinates and timestamps. It defines what we can describe as a moment or an event in CIDOC CRM. In digital humanities we reproduce documentations of the reality of the past for our own construction of reality and the future. If our concept of reality, e.g. concerning gender, changes we also have to change our data models. That means that ontologies as a shared conceptualization of reality are also an instrument of power and need to be reflected.

With these thoughts we adapted the event-based structure of CIDOC CRM and expanded the shortcut: Person – has_gender-Gender (E1 – P2 – E55). This is an uncontroversial modelling decision in the logic of CIDOC CRM because “P2_has_type” is a shortcut for crm:E1_Entity -P41_was_classified_by- E17_Type_Assignment / E17_Type_Assignment -P42_assigned-> E55_Type.
That gives us the opportunity to add data about the person that was classified and the actor that classified the person and the circumstances regarding time, space and documentation of this event. We can also add assumptions about events like a motivation. We use the E31_Document class to document our sources for assertions. That makes this data model a useful tool to sharpen our own perspective on our dataset.
Our application to gender classification shows the opportunities of event-based modelling when we don’t assume the existence of objectivity in our data.

The epistemological problem described here already shows up in the early years of CIDOC CRM when the specific property and class for gender were deleted.

20 years after this decision most academics discuss race, class and gender in the context of power structures in the history of intersectional feminism, post-colonial studies, poststructuralism, critical race theory, cultural studies and gender studies. These fields of research have an entangled history. Gender is, together with race and class an important factor e.g. in Post-Colonial Studies, Cultural Studies, Gender Studies, Poststructuralism and Deconstructivism. And gender is, like the categories race and class, a construct rooted deeply in historically evolved realities like justice or education systems and changes constantly which does not make them comparable. To treat it as an un-temporalized attribute like “blue eyes” reproduces historical power structures, which shaped the data that we use. We also reproduce power structures in other classifications like occupations.

At the first workshop we held (Linked Pasts, Ghent) we wanted to evaluate the current data structure we created with different persons to hear their opinions regarding our work. A participant (I forgot to write down your name, if you read this, please contact us, so we can acknowledge your contribution to our model) pointed out to us that we adopted from PBW the classification “slave” for a person as an occupation. We aggregated the Byzantine category “slave” and our own Euro-centric privileged concept of being human (when humans have to work, it is an occupation) without investing the time and work to check or reflect our own model and data critically for this test case and made our data problem even worse by that. This moment in the workshop made it once more clear how essential it is to have different perspectives on datasets. The data problem and the data model problem are connected. I think, if we don’t see the digital humanities as an act of translation, but as a reproduction of knowledge and power structures there are more opportunities in data modelling for the humanities.

When we consider the historical and contemporary reality of gender identities, we frequently find cases where gender assignments change over a person’s lifetime, like gender reveal rituals, or happen before the life starts or after the person‘s death, e.g. in documentation of archaeological excavations. According to important post-colonial researchers like Maria Lugones, Gayatri Chakravorty Spivak and Edward Saïd and to the Euro-centric and North American discourse on gender in the last 70 years (e.g. Beauvoir, Butler, Foucault) and, it is recognised that gender is not an innate attribute of a person, but a categorization attributed according to regional and historical circumstances in a long process of different events, which is often explained with physical features of biological objects like the human body. Attribution processes and the available gender vocabulary can vary over time and place, although an authoritarian gender categorization which is documented is performed in most cases in temporal proximity to a person's birth or puberty. Attaching metadata to a classification process doesn’t erase the power structures from the data we process, but it makes them more transparent.

In the beginning of July some members of the RELEVEN team travelled to IMC Leeds and presented our model to aggregate assertions about perspectives on mainly Byzantine sources from the 11th century (check out our older posts and our data model in progress for that).

The question which classifications and assertions we aggregate in our datasets and how we are able to and want to contextualize them are decisive for our project. From the beginning of the project we discussed the role of authority and source and never stopped that discussion. Our assertions and classifications are based on texts we cite and validated by the authority of their authors. By reproducing their knowledge, we validate their historical categories and assertions ourselves.

When we consider our own standpoint, we have to consider the relativity of our own knowledge, classifications and the words we use for that. The idea of an objective database or algorithm is not compatible with this principle from the humanities. Biased information is at the core of humanities and as researchers, I think we have to deal with that as well as we can. It is not a coincidence that ontologies in computer science and philosophy have the same appellation, I guess.

Many interesting questions arise from our working process in the last year: How do we define the difference between a classification and assertion? How do we contextualise our own classifications and assertions about reality in our historical data and make them transparent? How can we make assumptions about authority with our dataset? How can we incorporate the perspectives of others when we don’t share their standpoint and don’t work with them because we are (too) privileged in diverse categories and their intersections? What is the role of words, translation, interpretation, time and space in our data model? How will our data model shape information (data, language, assertions about reality... you name it) for the future?

Don’t hesitate to share your perspective/standpoint on this problem with us.

Stay tuned and meanwhile educate yourself on biased data.

Data model in CIDOC CRM, modelled with arrows.app.

Influences for the model.