English has constructions that sort of look like Greek middle voice.
The door closed (Greek: ἐκλείσθη ἡ θύρα)
Rachel got dressed. (Greek: ἐνεδύσατο Ῥαχήλ)
English also has a well-developed set of reflexive pronouns as well. So then why is it that we do not talk about English as having a middle voice?
The answer to that question can be expressed in three words: systematicity, pervasiveness, and grammaticalization. When we talk about linguistic categories and whether or not a given language has a particular category, we are not so much interested in what meanings can be conveyed, but how they are conveyed. Is it through an ad hoc collection of constructions of various origins? Or is it through a structured system, be it an inflectional paradigm or a set of free morphemes that form a contrastive set. How obligatory are these expressions? Are they required for communication or added freely and removed freely?
For example, in English we can convey inferential evidentiality with the “must have” construction, as in:
My wife must have drank the last of the coffee.
We can also convey hearsay evidentiality with the adverb “reportedly.”
The president is, reportedly, still on twitter.
But we do not have evidentiality as a grammatical category in English because there is no system of evidentiality markers. Instead, this is a collection of ad hoc constructions, not a full paradigm of morphemes.
Compare that with a language that has an evidentiality system, such as the Tucano language in northwest Amazonia. Tucano has a four way contrast in evidentiality (Aikenvald 2005, 52).
Visual Evidentiality: -ámi suffix
diây wa’îre yaha-ámi
‘The dog stole the ﬁsh’ (I saw it)
Non-visual Sensory Evidentiality: -ásĩ suffix
diây wa’îre yaha-ásĩ
‘The dog stole the ﬁsh’ (I heard the noise)
Inference Evidentiality: -ápĩ suffix
diây wa’îre ya ha-ápĩ
‘The dog stole the ﬁsh’ (I inferred it)
Reported Evidentality: -ápɨ’ suffix
diây wa’îre ya ha-ápɨ’
‘The dog stole the ﬁsh’ (I have learnt it from someone else)
Note that in every sentence, the suffixes appears in the exact same spot and thus together form a paradigm. Moreover, these four suffixes also grammaticalize together with Tucano’s RECENT PAST tense in the language as well, so evidentiality necessarily co-occurs with other grammatical information. This stands in stark contrast to English, where (1) providing the source of knowledge is not an obligatory element in the language and (2) such constructions take a larger collection of forms, co-opting modal auxiliaries such as must and adverbs like reportedly.
The reality is that language in general has a limited set of communicative needs. When it comes to various forms of voice and transitivity, one of those needs is conveying that (1) an event happened and (2) the energy that produced the event was internal or spontaneous and not from an external agent. Some languages formalize that communicative need into a voice system by creating a paradigm. Thus for Greek, motion verbs, body actions, grooming, speech production, various spontaneous events all receive systemic middle voice marking.
Other languages co-opt a sort of ad hoc set of constructions to convey it. This is what English does with its (1) nonagentive intransitives like ‘the door closed’ (one construction), (2) uses of the reflexive (another unrelated construction), and (3) use of ‘get’, as in John got dressed (a third construction).
Additionally, none of the English constructions cover the entire space of middle expression (such as speech acts or grooming verbs): Mike trimmed his beard is morphosyntactically the same as Mike trimmed the bushes. So for English, there isn’t a grammaticalized middle voice for this particular communicative need, while in Greek there is. Now of course that difference might seem like splitting hairs, but it’s an important one for understanding how various languages are similar or different in their structure. And that matters significantly for teaching the grammar of one language to a speaker of another language. It also matters for being able to produce accurate translations in languages, whether they are like Greek or like English or like Tucano.
The existence of a grammatical category in a language is decided not so much on the basis of whether you can say something in the language, but rather how it is said. Does the language have a built-in system for saying it or just a jumble of varying constructions?
Aikhenvald, Alexandra Y. 2005. Evidentiality. Oxford: Oxford University Press.
Hopper, Paul J. 2003. Grammaticalization. 2nd Edition. Cambridge: Cambridge University Press.
Lehmann, Christian. 2015. Thoughts on grammaticalization. 3rd Edition. Berlin: Language Science Press.