O mnie

Jestem programistą i architektem, który zawodowo projektuje i rozwija oprogramowanie. Obecnie koncentruję się głównie na własnym biznesie aplikacji, tworząc narzędzia, które pomagają ludziom i organizacjom efektywniej realizować zadania.

Moje doświadczenie obejmuje tworzenie aplikacji Windows oraz rozwijanie systemów ML/AI/Speech w systemie Linux. Zwracam uwagę na często pomijane aspekty technologii — takie jak jakość danych, dokumentacja oraz zapewnienie niezawodności przetwarzania na dużą skalę.

Posiadam doświadczenie w następujących obszarach:

Narzędzia wspierające produktywność: Tworzę oprogramowanie, które ułatwia codzienną pracę.
Uczenie maszynowe: Budowanie inteligentnych systemów, które są niezawodne na produkcji.
Dźwięk i mowa: Praktyczna praca z rozpoznawaniem mowy i przetwarzaniem sygnału audio.

Moje podejście jest dość proste: oprogramowanie powinno być wystarczająco wydajne, aby sprostać dużym wymaganiom, ale jednocześnie wystarczająco proste, aby było wygodne w obsłudze i utrzymaniu. Największą satysfakcję sprawiają mi ambitne wyzwania, jak choćby praca w zespole nad budową od podstaw systemu rozpoznawania mowy dla dalekiego pola czy tworzenie systemów wyszukiwania rozmytego big data dla międzynarodowych klientów z branży fintech.

Piotr Chlebek
SharkTime Software

Publikacje

InterSpeech · 2022-08-18

Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language

Mental health risk prediction is a growing field in the speech community, but many studies are based on small corpora. This study illustrates how variations in test and train set sizes impact performance in a controlled study. Using a corpus of over 65K labeled data points, results from a fully crossed design of different train/test size combinations are provided. Two model types are included: one based on language and the other on speech acoustics. Both use methods current in this domain. An age-mismatched test set was also included. Results show that (1) test sizes below 1K samples gave noisy results, even for larger training set sizes; (2) training set sizes of at least 2K were needed for stable results; (3) NLP and acoustic models behaved similarly with train/test size variations, and (4) the mismatched test set showed the same patterns as the matched test set. Additional factors are discussed, including label priors, model strength and pre-training, unique speakers, and data lengths. While no single study can specify exact size requirements, results demonstrate the need for appropriately sized train and test sets for future studies of mental health risk prediction from speech and language.

Przeczytaj więcej

Biomedical Sensing and Analysis · Signal Processing in Medicine and Biology · 2022-07-20

Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening

Co-author of the book chapter.

Przeczytaj więcej
Book Cover

IEEE, arxiv · 2021-05-13

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Speech-based algorithms have gained interest for the management of behavioral health conditions such as depression. We explore a speech-based transfer learning approach that uses a lightweight encoder and that transfers only the encoder weights, enabling a simplified run-time model. Our study uses a large data set containing roughly two orders of magnitude more speakers and sessions than used in prior work. The large data set enables reliable estimation of improvement from transfer learning. Results for the prediction of PHQ-8 labels show up to 27% relative performance gains for binary classification; these gains are statistically significant with a p-value close to zero. Improvements were also found for regression. Additionally, the gain from transfer learning does not appear to require strong source task performance. Results suggest that this approach is flexible and offers promise for efficient implementation.

Przeczytaj więcej

IEEE, arxiv · 2021-03-25

Cross-Demographic Portability of Deep NLP-Based Depression Models

Deep learning models are rapidly gaining interest for real-world applications in behavioral health. An important gap in current literature is how well such models generalize over different populations. We study Natural Language Processing (NLP) based models to explore portability over two different corpora highly mismatched in age. The first and larger corpus contains younger speakers. It is used to train an NLP model to predict depression. When testing on unseen speakers from the same age distribution, this model performs at AUC=0.82. We then test this model on the second corpus, which comprises seniors from a retirement community. Despite the large demographic differences in the two corpora, we saw only modest degradation in performance for the senior-corpus data, achieving AUC=0.76. Interestingly, in the senior population, we find AUC=0.81 for the subset of patients whose health state is consistent over time. Implications for demographic portability of speech-based applications are discussed.

Przeczytaj więcej

IEEE, arxiv · 2021-02-17

Robust Speech and Natural Language Processing Models for Depression Screening

Depression is a global health concern with a critical need for increased patient screening. Speech technology offers advantages for remote screening but must perform robustly across patients. We have described two deep learning models developed for this purpose. One model is based on acoustics; the other is based on natural language processing. Both models employ transfer learning. Data from a depression-labeled corpus in which 11,000 unique users interacted with a human-machine application using conversational speech is used. Results on binary depression classification have shown that both models perform at or above AUC=0.80 on unseen data with no speaker overlap. Performance is further analyzed as a function of test subset characteristics, finding that the models are generally robust over speaker and session variables. We conclude that models based on these approaches offer promise for generalized automated depression screening.

Przeczytaj więcej

IEEE, arxiv · 2021-02-16

Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning

Digital screening and monitoring applications can aid providers in the management of behavioral health conditions. We explore deep language models for detecting depression, anxiety, and their co-occurrence from conversational speech collected during 16k user interactions with an application. Labels come from PHQ-8 and GAD-7 results also collected by the application. We find that results for binary classification range from 0.86 to 0.79 AUC, depending on condition and co-occurrence. Best performance is achieved when a user has either both or neither condition, and we show that this result is not attributable to data skew. Finally, we find evidence suggesting that underlying word sequence cues may be more salient for depression than for anxiety.

Przeczytaj więcej

5th International Workshop on Mental Health And Well-Being: Sensing And Intervention · 2020-09-12

Comparing Speech Recognition Services for HCI Applications in Behavioral Health

Presented on 5th International Workshop on Mental Health And Well-Being: Sensing And Intervention. Website: UbiComp 2020 workshop.

Behavioral health conditions such as depression and anxiety are a global concern, and there is growing interest in employing speech technology to screen and monitor patients remotely. Language modeling approaches require automatic speech recognition (ASR) and multiple privacy-compliant ASR services are commercially available. We use a corpus of over 60 hours of speech from a behavioral health task, and compare ASR performance for four commercial vendors. We expected similar performance, but found large differences between the top and next-best performer, for both mobile (48% relative WER increase) and laptop (67% relative WER increase) data. Results suggest the importance of benchmarking ASR systems in this domain. Additionally we find that WER is not systematically related to depression itself. Performance is however affected by diverse audio quality from users’ personal devices, and possibly from the overall style of speech in this domain.

Przeczytaj więcej

Patenty

Złożony 2024-06-13

Systems and methods for predicting mental health conditions based on processing of conversational speech/text and language

…systems and methods for identifying the severity of a mental health condition or symptoms of same by listening to a human-to-human conversation by receiving conversation data, processing the conversation data to generate a language model output and/or an acoustic model output using one or more language models and/or acoustic models…

Przeczytaj więcej

Złożony 2020-10-23

Acoustic and natural language processing models for speech-based screening and monitoring of behavioral health conditions

The present disclosure provides acoustic and natural language processing (NLP) models for predicting whether a subject has a behavioral or mental health state of interest based at least in part on input speech from said subject.

Przeczytaj więcej

Złożony 2015-12-22

Automatic tuning of speech recognition parameters

System and techniques for automatic tuning of ASR parameters are described herein. A clean audio segment and a dirty audio segment may be obtained, in an iterative fashion, optimized preprocessing parameters may be obtained by, at an iteration, selecting a set of parameters, preprocessing the clean audio segment with the set of parameters to produce a first result, preprocessing the dirty audio segment with the set of parameters to produce a second result,…

Przeczytaj więcej

Złożony 2015-06-26

Phase response mismatch correction for multiple microphones

For a multiple microphone system, a phase response mismatch may be corrected. One embodiment includes receiving audio from a first microphone and from a second microphone, the microphones being coupled to a single device for combining the received audio, recording the received audio from the first microphone and the 2nd microphone…

Przeczytaj więcej

Złożony 2014-07-04

Replay attack detection in automatic speaker verification systems

Techniques related to detecting replay attacks on automatic speaker verification systems are discussed. Such techniques may include receiving an utterance from a user or a device playing back the utterance, determining features associated with the utterance, and classifying the utterance…

Przeczytaj więcej