Text-to-speech synthesis as an alternative communication means after total laryngectomy

Aims. Total laryngectomy still plays an essential part in the treatment of laryngeal cancer and loss of voice is the most feared consequence of the surgery. Commonly used rehabilitation methods include esophageal voice, electrolarynx, and implantation of voice prosthesis. In this paper we focus on a new perspective of vocal rehabilitation utilizing alternative and augmentative communication (AAC) methods. Methods and Patients. 61 consecutive patients treated by means of total laryngectomy with or w/o voice prosthesis implantation were included in the study. All were offered voice banking and personalized speech synthesis (PSS). They had to voluntarily express their willingness to participate and to prove the ability to use modern electronic communication devices. Results. Of 30 patients fulfilling the study criteria, only 18 completed voice recording sufficient for voice reconstruction and synthesis. Eventually, only 7 patients started to use this AAC technology during the early postoperative period. The frequency and total usage time of the device gradually decreased. Currently, only 6 patients are active users of the technology. Conclusion. The influence of communication with the surrounding world on the quality of life of patients after total laryngectomy is unquestionable. The possibility of using the spoken word with the patient's personalized voice is an indisputable advantage. Such a form of voice rehabilitation should be offered to all patients who are deemed eligible.


INTRODUCTION
With an estimated incidence of over 135,000 patients worldwide, squamous cell carcinoma of the larynx accounts for about 2-4.5% of all malignancies [1][2][3] . The main goal of the treatment is to eliminate the tumor completely and cure the patient. However, due to the specific location and function of the larynx, functional results also play a crucial role. Non-surgical treatment modalities, endoscopic surgery and partial laryngeal procedures from the external approach are aimed at maintaining the most important functions of the larynx, i.e. swallowing control, respiratory protection and the highest possible quality of voice.
Voice is one of the most important communication tools and it is an inextricable part of te human personality. The human voice was described by Janice Light as one of "the essences of human life" 4 . And yet up to 4 million US and 97 million world population are at risk of losing or have already lost their voice. Laryngeal carcinoma is not the only pathology that threatens people with loss of voice. Recent review article by Creer et al. provided a full and surprisingly rich list of pathologies related to voice deterioration or its loss 5 . However, unlike, for example, patients with motor neuron degeneration, the physiological form of the (though altered) voice can be preserved and even for a part of the population the eventual loss of the voice is preventable (smoking cessation, alcohol avoidance).
Although there are other larynx preserving possibilities in the treatment of laryngeal cancer, in some cases, performing total laryngectomy is unavoidable to save the patient's life. Total laryngectomy involves complete removal of the laryngeal cartilaginous framework including all the soft tissues of the laryngeal inlet and vocal cord apparatus. Resection of the larynx leads in an inevitable permanent division of the digestive tract and airway. The resulting pharyngeal defect is closed with T-or I-shaped suture in several layers. Thus, the anatomical rearrangement of organs after laryngectomy no longer allows the use of exhaled air to produce sound. Therefore, it is obvious, that apart from many other problems related to this procedure, one of the major impacts for patients is that they lose the ability to speak. Loss of verbal communication for various reasons is the most disabling consequence of surgical treatment and significantly decreases patient's quality of life 6,7 . Surgical procedures of this type are planned just short after the diagnosis of the malignant disease has been made, usually within two to six weeks. The complete loss of the voicing related to the surgery is rather abrupt and lasts at least one week before alternative ways of voicing are started. That means we have a relatively very short period to prepare any of the possible AAC methods ready for the use in immediate postoperative phase. Patients must be counselled accordingly to fully understand the impact of the procedure, it is highly advisable to instruct them in detail as well as their close family members. Proper voice rehabilitation cannot start in the immediate postoperative period due to the risk of the wound break up and inflammation related swelling. Therefore, alternative communication modalities support invaluable interaction between the patient, medical team, nursing staff, speech and language physiotherapist. It also significantly helps the patient's family to overcome the initial period of voiceless communication.
This article attempts to introduce one of the new AAC modalities for patients after total laryngectomy.

Brief history of voice rehabilitation
One of the first speaking aids was described in 1859 by Jan Nepomuk Czermak, who published a case of 18-yearold female with complete laryngeal stenosis with the artificial speaking device implantation. His device consisted of a tubing that was designed to reroute the airstream from the trachea to the lower parts of the larynx and thus to amplify the whisper produced by the patient. It was later described as the first artificial larynx 8 .
The first successful laryngectomy was performed by Christian Albert Theodor Billroth on the 31 st December 1873 on a 36-year-old male teacher with laryngeal carcinoma. Initially planned as a partial laryngectomy from median thyrotomy approach eventually ended up as a total laryngectomy with immediate artificial larynx placement. The voice rehabilitation of the first laryngectomee ever started under the supervision of Carl Gussenbauer with different devices 9 which were very visionary for their time. The phonation was performed by closure of the cannula, so the airflow from trachea was redirected into the pharynx and made the tongue and pharyngeal walls vibrate and produce a voice 10 . An obvious disadvantage of the first attempts of voice rehabilitation was the imperfect division of the swallowing tract and airways leading to a high morbidity and mortality rate related to aspiration pneumonia.

Current standards of voice rehabilitation
Oesophageal speech The first report of what is known today as oesophageal voice was presented by Raprand at the Academy of Science in Paris in 1828 (ref. 11 ). This method of voice rehabilitation remained unnamed until 1919, when the Czech laryngologist Seeman introduced the term oesophageal speech 12 .
The basic techniques of oesophageal speech involve building up enough positive pressure in the oral cavity through the activity of lips, tongue and cheeks. In the first phase, the tongue forces air from the mouth into the pharynx. In the second phase the back of the tongue overrides the sphincter pressure of the PE segment, thereby insufflating the oesophagus. Only 70-100 mL of air can be insufflated into the oesophagus, thus limiting both the volume and the phonation time of the speaker. Learning oesophageal speech requires lot of practice. Traditionally, reported rates of successfully rehabilitated patients do not exceed 30-35%. Note that the rehabilitation can only begin after the surgical wound has successfully healed.
External mechanical and electromechanical speech aids Patients who are not able or not willing to learn oesophageal voice have an alternative possibility of using mechanical and electromechanical vibrating devices commonly known as electrolarynges. An electrically operated vibrator is applied to the skin in the area of submandibular or submental trigon. Vibrations generated by the device are transmitted to the muscles of the pharyngeal wall and oral cavity and produce sound. Speech intelligibility is ensured by articulation within the oral cavity. Currently, there are three types of modern electrolarynges used -external transoral speech aids, external transcervical speech aids and intraoral speech aids. The current electrolarynges already use modern technology and allow users to change not only the volume of the voice, but also modulate its frequency 13 .

Tracheoesophageal (TE) fistula
Voice prosthesis inserted into the TE fistula is the most effective form of voice rehabilitation and is considered the gold standard of voice rehabilitation after total laryngectomy. After a period of efforts to modify surgical techniques for the rehabilitation of laryngectomized patients, the principle of TE voice rehabilitation has been established bycreating a fistula between the membranous wall of the trachea and the esophagus to harbor a suitable type of voice prosthesis. Very rapid development and testing of new materials with higher resistance to biofilm and higher biocompatibility enabled the production of new types of prostheses with longer life and very favorable properties (low resistance).
Voice prosthesis is nowadays widely used 14 . It is simply a one-way valve with a flap on the esophagus side of the prosthesis that mechanically prevents entry of liquids and solids. It allows patients to safely swallow, while ensuring the possibility to talk. Sound is produced while exhaling with covered stoma by air passing through the prosthesis, speech is formatted with the tongue, lips and resonant spaces. The voice prosthesis must be replaced regularly as it wears off and deforms due to fungal infection 15,16 .

Alternative and augmentative communication means voice banking and personalised speech synthesis
In addition to the aforementioned traditional methods of voice rehabilitation after total laryngectomy, it is also possible to use the developing methods of the so-called AAC. This is a group of techniques and technologies that enable and facilitate communication of the patient with the surroundings. From a technological point of view, it can be no, low or high-tech device. Some technologies allow the use of artificially created voice for communication. Speech synthesis and text-to-speech (TTS) devices are commonly referred to as voice output communication aids (VOCA) or speech generating devices (SGD) (ref. 17 ). Nowadays, in principle, everyone is a user of technological devices that enable voice communication. The most common are mobile phones and tablets. Patients are familiar with them and can handle them well even in the case of deterioration of health or in the period immediate after major surgery such as total laryngectomy 18 . It is therefore clear that with implemented text-to-speech software these devices can be used in everyday contact with the surroundings. The first form of recording the patient's own voice and its use in a period of deterioration or complete loss is the so-called message banking. In principle, it is a record of his/her own spoken words and sentences with appropriate intonation and emotional background, which is stored and sorted and then the records are used by the user at a suitable moment in communication with the surroundings. Developed in the 1990s, it is widely used for patients with neurodegenerative motoneuron diseases such as ALS (ref. 19 ). The next step is the evolution of voice synthesis and the use of devices with the possibility of converting written text to voice. The patient's own voice can be used more flexibly without having to use phrases appropriate to certain situations. The indisputable drawback of this approach is the inability of modulating the emotional subtext of the information that is being communicated. Lack of the possibility to express the emotions of patient might be a limitation, yet the possibility of communicating their wishes, thoughts, needs or expressing emotions in words or sentences and in their own voice can be a lever for patients during the fight against the loss of their identity. For the laryngectomee, this may be the best and most important option of communication in the period immediately after laryngectomy, as well as in the period when alaryngeal speech is dysfunctional. It should also be remembered that although the substitute voice mechanisms may retain some similarities in the tone of the voice, the synthesized form of the patient's voice is much closer to the original. There are several methods of processing voice synthesis that will be discussed further.

Theory behind the building of the speech corpus
As the previous text was published, it was found to be a factor in diagnosis. Malignancy and the planned operation play a very important role. Treatment delays can have enormous consequences for oncological treatment outcomes. Very stringent limits also affect the possibility of obtaining sufficient data for processing and further synthesis of the patient's personalized voice. The same time factor can play a role in the choice of the voice synthesis algorithm. Likewise, it is necessary to consider the overall condition of the patient, his/her age, fatigue. The quality of the voice can be and often is influenced by the gradual progression of the disease and thus can hamper the resulting quality of the voice synthesis.
For speech corpus of adequate quality, it is necessary to record a fairly large number of sentences. Recording takes place optimally in a soundproof studio, however, in order to simplify the organization of the recording, in some cases patients were recorded in a quiet room at the hospital, preferably in the evenings with minimal distracting background noise. Sentences are recorded with the maximal possible quality. It is generally better to record them in a sentence by sentence manner to avoid any influence by the context of preceding sentence. Voice recording should be stopped ideally when voice quality deteriorates due to voice fatigue. To build a sufficiently robust body of synthesized voice, 300-1000 carefully selected sentences containing all possible phoneme variants should be recorded. For motivated patients with good voice quality before surgery, more sentences can be recorded that further optimizes the quality of the resulting voice synthesis. Of course, more sentences mean better coverage of existing phonemes and thus a more realistic resemblance of synthetic voice and patient's voice. Recording of sentences is in principal the same as recording for professional speakers, and the recording itself needs to be supervised. The necessity to travel for the purpose of voice recording could have precluded some of the patients from the active participation in the study. Automating the selection of recorded sentences after checking the record quality would greatly simplify the process. A portable device capable of recording highquality voice would enable the entire agenda to be carried out at home. Efforts to maximize the quality of voice synthesis did not allow this with patients entering the study, so all records were made, and the quality of the record was checked by trained staff. Further technical details are provided in other publications by the investigation team 20 .

MATERIAL AND METHODS
From January 1 st , 2015 till December 31 st , 2016, 61 patients scheduled for surgical treatment of laryngeal or hypopharyngeal cancer have been addressed to participate in the voice banking and TTS study. All patients actively expressed their will to participate in the study and signed their informed consent. The study was carried out by the University of West Bohemia in cooperation with Charles University and the Faculty Hospital in Motol whose Ethics Committee approved the research (under registration number EK-60A1/14). All 18 patients suitable for the study carried out a recording of their own voice for the purpose of personalized speech synthesis (PSS), another 14 patients expressed their desire to communicate through mobile devices using anonymous professional synthetic voice chosen from the software provider's voice library and text-to-speech software/device. All patients included in the study filled in the WHOQOL-BREF and the VHI-30 (Czech validated version 21 ) questionnaire before the surgery and at 6-8 months after the termination of the treatment. Analysis of the frequency of use of TTS using PSS or professional SS was performed. Further endpoints were the evaluation of the VHI and QoL questionnaires.

RESULTS
61 patients were invited to participate in the study. All were scheduled for total laryngectomy for T3-T4a laryngeal or hypopharyngeal cancer with uni or bilateral neck dissection according to the regional lymph node involvement. Quality of their voice preoperatively was assessed and rated according to the scale with rating 1-3 (normal (1), dysphonic (2) and aphonic (3)). No voice was rated as of physiological quality in any of the patients. 31 were assessed as unsuitable for recording voice recordings due to low voice quality prior to surgery or due to unsatisfactory cooperation and compliance. Of the 30 patients with sufficient quality of voice, only 18 were willing and able to obtain a sufficiently robust record permitting a satisfactory quality personalized voice synthesis. Of those 18 patients, in 11 voice prosthesis was implanted primarily during the total laryngectomy. Patients were able to record between 210 and 1400 sentences, the arithmetic mean being 596 sentences. Two patients recorded less than 300 sentences (210 and 230 sentences), one exactly 300 sentences. For most, unit selection (US) or hidden Markov model (HMM) systems were used to perform PSS. The resulting product was provided as a device download software with the ability to convert written text to voice. Altogether, only 7 patients eventually started to use the TTS technology during the early postoperative period (Fig. 1). The frequency and total usage time of the device/24 h (compliance) was far better in the first postoperative week than later during hospital stay when the effort to use the device gradually decreased. The frequency and time of use of the technology also further decreased after the implanted voice prosthesis became fully functional (11 of 18 patients with PSS). Currently only 6 patients are active users of the technology. One of the patients who was implanted with the voice prosthesis as well is also an active software user. He uses TTS as he lectures at the university and prepares the text of the   Table 1.

DISCUSSION
Treatment of laryngeal and hypopharyngeal carcinoma continues to evolve. Though there has been substantial development in organ preserving treatment strategies and organ preserving surgery, the overall oncologic results have not changed. There is a clear shift from the total laryngectomy to organ preservation protocols in response to the Veterans Affairs study 2,3 . The latest data suggest that the previously assumed equivalence of the survival in T4 cancers of both the larynx and hypopharynx may require reconsideration, because surgical arms show better results. In cases of advanced and very advanced laryngeal and hypopharyngeal cancer the organ preserving surgery is rarely possible. Unfortunately, the total laryngectomy is a procedure that significantly reduces the patient's quality of life. It has been repeatedly demonstrated that patients are willing to sacrifice an overall chance of survival for the possibility of maintaining a functioning larynx with the possibility of physiological voice production 6,22 . It is clearly a voice related quality of life issue that plays the most important role in patients' decision-making process. A prospective study of Laccourreye et al. showed this quite clearly. In their study nearly 2/3 of patients expressed their desire to avoid total laryngectomy despite the known impact on the overall survival rates 22 . The reason behind this decision is a combination of fear of loss of identity, the possibility to communicate with people, the possibility of communication of wishes and the expression of basic needs. Also, the fear of the economic impact of the loss of verbal communication may play a role. A very recent study published by Kotake et al. clearly shows that approximately 28% of patients blame the loss of voice as the main cause of unemployment in the post-laryngectomy period 23 .
The current gold standard of care in rehabilitation after total laryngectomy is a tracheoesophageal speech with the voice prosthesis placement. Though this provides an easy-to-learn method of voice production that can be used just 14 days after surgery, the drawback is the necessity to regularly replace the prosthesis based on the device lifetime, possible complications at the site of tracheoesophageal fistula etc. In the case of a voice prosthesis, voice communication is disabled. Replacing a voice prosthesis is not a complicated task, but the availability of exchanges may be limited to logistical factors (distance of residence, etc.). For example, Galli et al. presented a cohort of 15 patients. 20% would not choose voice prosthesis as a method of choice if they had a chance just because of long distance from the hospital. Other methods of voice rehabilitation also have their pitfalls. Oesophageal speech is a remarkable way for restoring the patient's voice, but it is very challenging to learn it properly, leaving up to 30-70% of patients unable to speak despite the effort 24 . Effective voice rehabilitation plays an essential role in re-creating a satisfactory life after total laryngectomy, return of self-confidence, feeling safe with possible verbal defense. But should any modality of voice rehabilitation be used, there may always be a period of silence without the ability to verbally express oneself (voice prosthesis dysfunction, electrolarynx breakdown etc.). With text-tospeech synthesis, we are now able to help patients overcome these periods and allow them to speak with their own or other preferred voice from day one after surgery. The main benefit is easy-to-use concept of this method, it virtually requires minimal training and preparation 25 . The cost of this type of rehabilitation includes expenses for voice recording and synthesis, and then the cost of the mobile app (Fig. 2). There is no need of additional hospital visits.
The quality of life is significantly influenced after total laryngectomy. However, the overall trend in quality of life development is positive; patients perceive overall quality of life after surgery better than before surgery 25 . When evaluating the quality of life after total laryngectomy, simple use of both WHOQOL-BREF and the VHI-30 questionnaires failed to fully capture the full range of aspects (such as voice rehabilitation modality, patient education, socioeconomic conditions, adjuvant cancer therapy, the need for reconstructive surgery, etc.) that influence this specific group of patients. Therefore, the ability to filter out aspects that contribute to subjective evaluation by patients remains a challenge. Despite these difficulties, even when comparing the results of questionnaires of voice rehabilitation patients with voice prosthesis with patients using only voice synthesis, in both groups, there is a visible positive trend in improving the quality of life perceived by the patients themselves. This, in our opinion, confirms the positive impact of rehabilitation of these patients with SGD on the quality of life. Demanding therapy and health changes leading to an increased risk of depression are undoubtedly also reflected in the low cooperation of patients in completing questionnaires after surgery. In contrast to patients with, for example, ALS, who are showing a growing interest in the use of AAC, we see a subgroup using TTS in our patients. This phenomenon can also be explained by the higher average age of patients than in the group of patients with neurodegenerative diseases.

CONCLUSION
There is no doubt that voice prosthesis represents a highly effective form of rehabilitation and provides excellent results in the majority of patients. Nevertheless, our results show that for all laryngectomees, voice banking and text-to-speech synthesis used through SDG can certainly be an opportunity to increase the quality of life and decrease the number of patients reluctant to accept the preferred surgical treatment in locally advanced and very advanced tumors.