Overview
The Pnoi Corpus is a collection of breath sounds recorded simultaneously from the mouth and chest of both healthy subjects and individuals diagnosed with asthma. The dataset also includes biodata, subject's response to a standard performa and pulmonary function test (PFT) reports. The recordings were captured using a Littmann CORE digital stethoscope and a Zoom H6 microphone. Sounds of Five breath were recored from four locations on the posterior chest. The data was collected at St. John Hospital, Pulmonary Health Science department.
For asthmatic subjects, a post recording was made ofter administering a bronchodilator. The post recording was made 15 minutes after the administration of the bronchodilator. The post recording was made to assess the effect of the bronchodilator on the subject's breathing sounds.
The dataset consists of 150 asthma patients and 150 control subjects, totaling 300 recordings. Each recording is accompanied by corresponding metadata, including demographic information, medical history, and the results of the PFT report.
pnoi data accquisition
Data Statistics
-
Total number of recordings: 300
- Asthma patients: 150
- Control subjects: 150
-
Distribution of patients by gender:
- Male: 60%
- Female: 39%
- Other: 1%
-
Age distribution:
- Minimum age: 18
- Maximum age: 75
- Average age: 34
- Median age: 40
-
Additional statistics:
- Average duration of breath sound recordings: 15 to 20 seconds
- Minimum duration: 6 seconds
- Maximum duration: 45 seconds
Data Organization
The dataset is organized into the following directories and files:
-
PNOI_CORPUS/
-
README.md
: A brief description of the dataset and its contents. -
DATA_PNOISTOR/
: Raw data dump from the PnoiStor web-app -
DATA/
-
[subject-ID]/
: A folder for each subject consist of all relevant data-
[subject-ID]_[location].wav
: Two channels audio recording of the subject's breathing sounds -
[subject-ID]_[location].txt
: Text file containing annotations for the audio recording. -
[subject-ID].json
: JSON file containing metadata for the subject -
[subject-ID].tsv
: TSV file containing the results of the PFT report for the subject
-
-
Data format
Audio file
Nomenclature: seperated by "-"
- [
data_version
]: Month and Year of data collection - [
subject-ID
]: Unique ID for each subject - [
location
]: Location of the recording on the posterior chest BA
: "Breath Audio file tag",before/after
: "before" or "after" administration of bronchodilatorlocation
: 1.LU
: "Left Upper Lobe", 2.RU
: "Right Upper Lobe", 3.LL
: "Left Lower Lobe", 4.RL
: "Right Lower Lobe",- [
file-ID
]: Unique ID for each recording file - [
comment
]: Additional information about the recording
The audio file contains TWO channels for breath sounds recorded simultaneously from the mouth and chest of the subject.
- Channel 0: Mouth
- Channel 1: Chest
EXAMPLE
: pnoistor_oct07-amartyaveer_81b8f33c-BA_before_LL-c03e-lateupl.wav
Annotation file
Nomenclature: Same as Audio file, but with .txt
extension
- values are tab separated.
The annotation file contains the following information:
begin
: Start time of the annotation in secondsend
: End time of the annotation in secondslabel
: Label of the annotation
EXAMPLE
: pnoistor_oct07-amartyaveer_81b8f33c-BA_before_LL-c03e-lateupl.txt
Labels
The subjects are also requested to recite sustained phonation of the vowels /a/, /i/, /u/, and /o/ for 2-5 seconds. The annotation labels are as follows:
bb[n]
: marks a single breathing session consisting of Five breaths recorded at a single location. n encodes the location of the recording.
File ID and Comments
- Are optional infornation to uniquely identify the file and add any comments about the recording.
Fig. Spectrogram showing two channels breath audio annotated with respective labels.
Metadata file
Nomenclature: Same as Audio file, but with .json
extension and with file tag META
JSON file contains metadata for the subject. The metadata includes:
EXAMPLE
{
"1": { "Q": "What is your smoking status?", "A": "Non-smoker" },
"2": { "Q": "Do you have repeated episodes of cough?", "A": "Yes" },
"3": {
"Q": "How many times do you have cough and chest tightness?",
"A": "1"
},
"4": { "Q": "How long does each episode last?", "A": "-" },
"5": { "Q": "Have you been diagnosed with Asthma?", "A": "Yes" },
"6": { "Q": "Wheeze and chest tightness present?", "A": "Yes" },
"7": { "Q": "For how long is the symptom present? (days)", "A": "0" },
"8": {
"Q": "Do you experience episodic or continuous wheeze?",
"A": "Episodic"
},
"9": { "Q": "Does your wheeze vary with seasons?", "A": "Yes" },
"10": { "Q": "Does your wheeze vary across the day?", "A": "Yes" },
"11": { "Q": "Do you have cough?", "A": "Yes" },
"12": { "Q": "Do you have dry or wet cough?", "A": "Dry" },
"13": { "Q": "Sputum color", "A": "-" },
"14": { "Q": "Are you under any medication for asthma?", "A": "Yes" },
"15": { "Q": "Do you use inhalers or nebulizers?", "A": "Inhalers" },
"16": { "Q": "Do you have family history of asthma?", "A": "Yes" },
"17": { "Q": "Do you have allergies?", "A": "Yes" },
"18": { "Q": "Triggers for allergies?", "A": "Strong smells" },
"19": { "Q": "Have you suffered from lung TB in the past?", "A": "No" },
"20": { "Q": "Do you have any other respiratory illness?", "A": "No" },
"21": { "Q": "Write in brief About it", "A": "-" },
"22": { "Q": "Are you a known case of high blood pressure?", "A": "No" },
"23": { "Q": "Your high blood pressure is", "A": "-" },
"24": { "Q": "Are you a known case of diabetes?", "A": "No" },
"25": { "Q": "Your diabetes is", "A": "-" },
"26": { "Q": "Are you a known case of heart disease?", "A": "No" },
"27": { "Q": "Any other health problems?", "A": "No" },
"28": { "Q": "What other health problems?", "A": "-" },
"29": {
"Q": "Are you currently COVID-19 positive or have common cold?",
"A": "No"
},
"30": {
"Q": "Have you been COVID-19 positive or had cold in last 15 days?",
"A": "No"
},
"firebaseId": { "Q": "firebaseId?", "A": "vijayaomkar_b866b679" },
"subjectAge": { "Q": "subjectAge?", "A": "64" },
"subjectGender": { "Q": "subjectGender?", "A": "Female" },
"subjectHeight": { "Q": "subjectHeight?", "A": "153" },
"subjectName": { "Q": "subjectName?", "A": "Vijaya Omkar" },
"subjectRemunerationDetails": {
"Q": "subjectRemunerationDetails?",
"A": "X"
},
"subjectRemunerationType": {
"Q": "subjectRemunerationType?",
"A": "Account No."
},
"subjectSectionDone": { "Q": "subjectSectionDone?", "A": true },
"subjectType": { "Q": "subjectType?", "A": "Patient" },
"subjectWeight": { "Q": "subjectWeight?", "A": "59" }
}
PFT Report
Nomenclature: Same as Audio file, but with .tsv
extension and with file tag PFT
TSV file contains PFT report for the subject. The PFT report includes:
PFT | FEV1 | FVC | ratio |
---|---|---|---|
ref | 1.58 | 1.91 | 77 |
val | 1.48 | 1.63 | 91 |
- ref: Reference value (predicted value) for the subject
- val: Measured value for the subject
- FEV1: Forced Expiratory Volume in 1 second
- FVC: Forced Vital Capacity
- ratio: FEV1/FVC
For further details, refer
- Pnoi-phone: Main post for Pnoi-phone project.
- User Research: User research conducted for Pnoi-phone to understand the user needs and requirements.
- Pnoi Stor: Data collection protocol, organisation scheme and storage for Pnoi-phone project.
- Product Design: Design and development of Pnoi-phone biomedical device integrated with AI models to diagnose and monitor airway diseases.
- Embedded System Design: Design and development of embedded system for Pnoi-phone biomedical device.
- Pnoi-phone App: Design and development of Pnoi-phone android app.
Disclaimer
The Pnoi Corpus is intended for research and educational purposes only. It is important to note that the dataset does not constitute medical advice or diagnosis. Users of this dataset are responsible for ensuring compliance with applicable ethical guidelines and regulations when using the data.