O mnie
Prowadzę małą, niezależną firmę programistyczną, koncentrującą się na badaniach i rozwoju. Moja praca koncentruje się na tworzeniu praktycznych i niezawodnych rozwiązań, które umożliwiają osobom i organizacjom efektywniejszą pracę.
Specjalizuję się w tworzeniu aplikacji dla systemu Windows oraz rozwijaniu systemów ML/AI dla systemów Unix/Linux, ze szczególnym uwzględnieniem jakości danych i skalowalnego przetwarzania danych masowych. Moje obszary specjalizacji obejmują:
- Aplikacje zwiększające produktywność
- Uczenie maszynowe
- Przetwarzanie dźwięku i rozpoznawanie mowy
Lubię uczestniczyć w pełnym cyklu rozwoju oprogramowania, od koncepcji po wdrożenie. Moim celem jest projektowanie oprogramowania, które nie tylko efektywnie się skaluje, ale także pozostaje intuicyjne i przyjazne dla użytkownika, ostatecznie przynosząc realną wartość użytkownikom.
Piotr Chlebek
SharkTime Software
Publikacje
InterSpeech · 2022-08-18
Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language
Mental health risk prediction is a growing field in the speech community, but many studies are based on small corpora. This study illustrates how variations in test and train set sizes impact performance in a
controlled study. Using a corpus of over 65K labeled data points, results from a fully crossed design of different train/test size combinations are provided. Two model types are included: one based on language
and the other on speech acoustics. Both use methods current in this domain. An age-mismatched test set was also included. Results show that (1) test sizes below 1K samples gave noisy results, even for larger
training set sizes; (2) training set sizes of at least 2K were needed for stable results; (3) NLP and acoustic models behaved similarly with train/test size variations, and (4) the mismatched test set showed the
same patterns as the matched test set. Additional factors are discussed, including label priors, model strength and pre-training, unique speakers, and data lengths. While no single study can specify exact size
requirements, results demonstrate the need for appropriately sized train and test sets for future studies of mental health risk prediction from speech and language.
Przeczytaj więcej
Biomedical Sensing and Analysis · Signal Processing in Medicine and Biology · 2022-07-20
Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening
Co-author of the book chapter.
Przeczytaj więcej

IEEE, arxiv · 2021-05-13
Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
Speech-based algorithms have gained interest for the management of behavioral health conditions such as depression. We explore a speech-based transfer learning approach that uses a lightweight encoder and that
transfers only the encoder weights, enabling a simplified run-time model. Our study uses a large data set containing roughly two orders of magnitude more speakers and sessions than used in prior work. The large
data set enables reliable estimation of improvement from transfer learning. Results for the prediction of PHQ-8 labels show up to 27% relative performance gains for binary classification; these gains are
statistically significant with a p-value close to zero. Improvements were also found for regression. Additionally, the gain from transfer learning does not appear to require strong source task performance.
Results suggest that this approach is flexible and offers promise for efficient implementation.
Przeczytaj więcej
IEEE, arxiv · 2021-03-25
Cross-Demographic Portability of Deep NLP-Based Depression Models
Deep learning models are rapidly gaining interest for real-world applications in behavioral health. An important gap in current literature is how well such models generalize over different populations. We
study Natural Language Processing (NLP) based models to explore portability over two different corpora highly mismatched in age. The first and larger corpus contains younger speakers. It is used to train an NLP
model to predict depression. When testing on unseen speakers from the same age distribution, this model performs at AUC=0.82. We then test this model on the second corpus, which comprises seniors from a
retirement community. Despite the large demographic differences in the two corpora, we saw only modest degradation in performance for the senior-corpus data, achieving AUC=0.76. Interestingly, in the senior
population, we find AUC=0.81 for the subset of patients whose health state is consistent over time. Implications for demographic portability of speech-based applications are discussed.
Przeczytaj więcej
IEEE, arxiv · 2021-02-17
Robust Speech and Natural Language Processing Models for Depression Screening
Depression is a global health concern with a critical need for increased patient screening. Speech technology offers advantages for remote screening but must perform robustly across patients. We have described
two deep learning models developed for this purpose. One model is based on acoustics; the other is based on natural language processing. Both models employ transfer learning. Data from a depression-labeled corpus
in which 11,000 unique users interacted with a human-machine application using conversational speech is used. Results on binary depression classification have shown that both models perform at or above AUC=0.80
on unseen data with no speaker overlap. Performance is further analyzed as a function of test subset characteristics, finding that the models are generally robust over speaker and session variables. We conclude
that models based on these approaches offer promise for generalized automated depression screening.
Przeczytaj więcej
IEEE, arxiv · 2021-02-16
Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning
Digital screening and monitoring applications can aid providers in the management of behavioral health conditions. We explore deep language models for detecting depression, anxiety, and their co-occurrence
from conversational speech collected during 16k user interactions with an application. Labels come from PHQ-8 and GAD-7 results also collected by the application. We find that results for binary classification
range from 0.86 to 0.79 AUC, depending on condition and co-occurrence. Best performance is achieved when a user has either both or neither condition, and we show that this result is not attributable to data skew.
Finally, we find evidence suggesting that underlying word sequence cues may be more salient for depression than for anxiety.
Przeczytaj więcej
5th International Workshop on Mental Health And Well-Being: Sensing And Intervention · 2020-09-12
Comparing Speech Recognition Services for HCI Applications in Behavioral Health
Presented on 5th International Workshop on Mental Health And Well-Being: Sensing And Intervention.
Website: UbiComp 2020 workshop.
Behavioral health conditions such as depression and anxiety are a global concern, and there is growing interest in employing speech technology to screen and monitor patients remotely. Language modeling approaches require automatic speech recognition (ASR) and multiple privacy-compliant ASR services are commercially available. We use a corpus of over 60 hours of speech from a behavioral health task, and compare ASR performance for four commercial vendors.
We expected similar performance, but found large differences between the top and next-best performer, for both mobile (48% relative WER increase) and laptop (67% relative WER increase) data. Results suggest the importance of benchmarking ASR systems in this domain. Additionally we find that WER is not systematically related to depression itself. Performance is however affected by diverse audio quality from users’ personal devices, and possibly from the overall style of speech in this domain.
Przeczytaj więcej
Patenty
Złożony 2024-06-13
Systems and methods for predicting mental health conditions based on processing of conversational speech/text and language
…systems and methods for identifying the severity of a mental health condition or symptoms of same by listening to a human-to-human conversation by receiving conversation data, processing the conversation data to generate a language model output and/or an acoustic model output using one or more language models and/or acoustic models…
Przeczytaj więcej
Złożony 2020-10-23
Acoustic and natural language processing models for speech-based screening and monitoring of behavioral health conditions
The present disclosure provides acoustic and natural language processing (NLP) models for predicting whether a subject has a behavioral or mental health state of interest based at least in part on input speech from said subject.
Przeczytaj więcej
Złożony 2015-12-22
Automatic tuning of speech recognition parameters
System and techniques for automatic tuning of ASR parameters are described herein. A clean audio segment and a dirty audio segment may be obtained, in an iterative fashion, optimized preprocessing parameters may be obtained by, at an iteration, selecting a set of parameters, preprocessing the clean audio segment with the set of parameters to produce a first result, preprocessing the dirty audio segment with the set of parameters to produce a second result,…
Przeczytaj więcej
Złożony 2015-06-26
Phase response mismatch correction for multiple microphones
For a multiple microphone system, a phase response mismatch may be corrected. One embodiment includes receiving audio from a first microphone and from a second microphone, the microphones being coupled to a single device for combining the received audio, recording the received audio from the first microphone and the 2nd microphone…
Przeczytaj więcej
Złożony 2014-07-04
Replay attack detection in automatic speaker verification systems
Techniques related to detecting replay attacks on automatic speaker verification systems are discussed. Such techniques may include receiving an utterance from a user or a device playing back the utterance, determining features associated with the utterance, and classifying the utterance…
Przeczytaj więcej