Panda Dataframe, scypy, numpy, data typing, etc.

Author
Affiliations

Md Rasheduzzaman

Last updated

September 26, 2025

Content summary
Pydantic

Dynamic data type

Data typing in python: It is dynamic in python. But we can put some hints to help users with their input. But still, it can be problematic. See below:

def insert_patient_data(name: str, age: int):
  print(name)
  print(age)
  print("inserted into the DB")

insert_patient_data("Rashed", "thirty")
Rashed
thirty
inserted into the DB

You see, nobody is stopping the user to put age as a string. A better way would be to keep a check on the data type using loop. If the data type doesn’t match, we will raise an error.

def insert_patient_data(name: str, age: int):
  if type(name)==str and type(age)==int:
    print(name)
    print(age)
    print("inserted into the DB")
  else:
    raise TypeError("Incorrect data type")

insert_patient_data("Rashed", 30)
Rashed
30
inserted into the DB
insert_patient_data("Rashed", "thirty")
TypeError: Incorrect data type

We see a data type error here, so our system works to catch it.

But this way is not scalable. Let’s work around it.

def insert_patient_data(name: str, age: int):
  if type(name)==str and type(age)==int:
    print(name)
    print(age)
    print("inserted into the DB")
  else:
    raise TypeError("Incorrect data type")

def update_patient_data(name: str, age: int):
  if type(name)==str and type(age)==int:
    print(name)
    print(age)
    print("Updated")
  else:
    raise TypeError("Incorrect data type")

insert_patient_data("Rashed", 30)
Rashed
30
inserted into the DB
update_patient_data("Rashed", 29)
Rashed
29
Updated

You see the issue with scalability? How many times will we do it if we have more functions using these variables? Data validation is also very important for us for better control. In the above example, we could put -10 as age, it would pass the data type check, there is no stopping. But is it meaningful? So, we could say age can not be less than 0. How to do it?

def insert_patient_data(name: str, age: int):
  if type(name)==str and type(age)==int:
    if age < 0:
      raise ValueError("Age cannot be less than 0")
    else:
      print(name)
      print(age)
      print("Inserted into the DB")
  else:
    raise TypeError("Incorrect data type")

Now, let’s check.

insert_patient_data("Rashed", 10)
Rashed
10
Inserted into the DB
insert_patient_data("Rashed", -10)
ValueError: Age cannot be less than 0
insert_patient_data("Rashed", "10")
TypeError: Incorrect data type

Here comes Pydantic to help us checking for

  • Data type, and
  • Data validation

And it does so in 3 steps:

    1. Define a Pydantic model (class) representing the ideal schema. This includes the expected fields, their data types and any validation constraint (e.g. lt=0 for negative numbers)
    1. Instantiate the model with raw input data or make a Pydantic object (usually a dictionary or JSON-like structure)
    • Pydantic will automatically validate the data and coerce it into the correct Python types (if possible)
    • If the data doesn’t meet the model’s criteria, Pydantic raise a ValidationError.
    1. Pass the validated model object to functions or use it throughout your codebase.
    • This ensures that every part of your program works with clean, type-safe, and logically valid data.

Let’s use it now. But let’s make the example more realistic. We will make a dataframe with the required fields using pandas first. Then we will insert a patient info into that dataframe if the patient is new. If not, we will update information for that patient.

from pydantic import BaseModel, ValidationError
import pandas as pd

# ---------------------
# 1. Define the model
# ---------------------
class Patient(BaseModel):
    name: str
    age: int
    weight: float

# ---------------------
# 2. In-memory database
# ---------------------
# Create a DataFrame to store patient records
db = pd.DataFrame({
    'name': pd.Series(dtype='str'),
    'age': pd.Series(dtype='int'),
    'weight': pd.Series(dtype='float')
})

# ---------------------
# 3. Insert function
# ---------------------
def insert_patient_data(patient: Patient):
    global db
    # Check if patient already exists by name
    if db['name'].eq(patient.name).any():
        print(f"Patient '{patient.name}' already exists. Use update instead.")
        return
    
    # Append new patient
    db = pd.concat([db, pd.DataFrame([patient.model_dump()])], ignore_index=True)
    print(f"Inserted patient: {patient.name}")

# ---------------------
# 4. Update function
# ---------------------
def update_patient_data(patient: Patient):
    global db
    # Find index of the patient by name
    idx = db.index[db['name'] == patient.name].tolist()
    if not idx:
        print(f"Patient '{patient.name}' not found. Use insert instead.")
        return
    
    # Update the record
    db.loc[idx[0], ['age', 'weight']] = patient.age, patient.weight
    print(f"Updated patient: {patient.name}")

# ---------------------
# 5. Test the system
# ---------------------
# Initial insert
patient_info = {'name': 'Rashed', 'age': 29, 'weight': '55'}
try:
    patient1 = Patient(**patient_info) #unpacking using 2 star signs
    insert_patient_data(patient1)
except ValidationError as e:
    print("Validation Error:", e)
Inserted patient: Rashed
# Try to insert again (should warn)
insert_patient_data(patient1)
Patient 'Rashed' already exists. Use update instead.
# Update patient
updated_info = {'name': 'Rashed', 'age': 30, 'weight': 57.5}
try:
    patient1_updated = Patient(**updated_info)
    update_patient_data(patient1_updated)
except ValidationError as e:
    print("Validation Error:", e)
Updated patient: Rashed
# Show database
print("\nCurrent Database:")

Current Database:
print(db)
     name  age  weight
0  Rashed   30    57.5

Did you notice something? We put 'weight': '55' and PyDantic coerced it to float smartly.

But we have another practical issue remaining. Names are not reliable identifier, multiple patients could have the same name. So, we need to handle it correctly using a patient id.

from pydantic import BaseModel, ValidationError
import pandas as pd

# ---------------------
# 1. Patient model with manual ID
# ---------------------
class Patient(BaseModel):
    patient_id: str
    name: str
    age: int
    weight: float

# ---------------------
# 2. In-memory DB
# ---------------------
db = pd.DataFrame({
    'patient_id': pd.Series(dtype='str'),
    'name': pd.Series(dtype='str'),
    'age': pd.Series(dtype='int'),
    'weight': pd.Series(dtype='float')
})

# ---------------------
# 3. Insert function
# ---------------------
def insert_patient_data(patient: Patient):
    global db
    if db['patient_id'].eq(patient.patient_id).any():
        print(f"Patient ID '{patient.patient_id}' already exists. Use update instead.")
        return
    new_row = pd.DataFrame([patient.model_dump()])
    db = pd.concat([db, new_row], ignore_index=True)
    print(f"Inserted patient: {patient.name} with ID: {patient.patient_id}")

# ---------------------
# 4. Update function
# ---------------------
def update_patient_data(patient: Patient):
    global db
    idx = db.index[db['patient_id'] == patient.patient_id].tolist()
    if not idx:
        print(f"Patient ID '{patient.patient_id}' not found. Use insert instead.")
        return
    db.loc[idx[0], ['name', 'age', 'weight']] = patient.name, patient.age, patient.weight
    print(f"Updated patient: {patient.name} with ID: {patient.patient_id}")

# ---------------------
# 5. Test it
# ---------------------
try:
    # Add 2 patients manually
    patient1 = Patient(patient_id='P001', name='Rashed', age=29, weight=55)
    patient2 = Patient(patient_id='P002', name='Rashed', age=40, weight=70)

    insert_patient_data(patient1)
    insert_patient_data(patient2)

    # Attempt duplicate insert
    insert_patient_data(patient1)

    # Update patient1
    patient1_updated = Patient(patient_id='P001', name='Rashed', age=30, weight=56.5)
    update_patient_data(patient1_updated)

except ValidationError as e:
    print("Validation Error:", e)
Inserted patient: Rashed with ID: P001
Inserted patient: Rashed with ID: P002
Patient ID 'P001' already exists. Use update instead.
Updated patient: Rashed with ID: P001
# ---------------------
# 6. Show DB
# ---------------------
print("\nCurrent Database:")

Current Database:
print(db)
  patient_id    name  age  weight
0       P001  Rashed   30    56.5
1       P002  Rashed   40    70.0

Let’s make a bit more complex model. We are going to add more fields having more than one entry. So, pandas dataframe is not a good choice. We will use json data format instead.

from pydantic import BaseModel, ValidationError
from typing import List, Dict
import json

# ---------------------
# 1. Patient model
# ---------------------
class Patient(BaseModel):
    patient_id: str
    name: str
    age: int
    weight: float
    married: bool
    allergies: List[str]
    contact_info: Dict[str, str]

# ---------------------
# 2. In-memory "DB"
# ---------------------
db: List[Patient] = []

# ---------------------
# 3. Insert function
# ---------------------
def insert_patient_data(patient: Patient):
    global db
    if any(p.patient_id == patient.patient_id for p in db):
        print(f"Patient ID '{patient.patient_id}' already exists. Use update instead.")
        return
    db.append(patient)
    print(f"Inserted patient: {patient.name} with ID: {patient.patient_id}")

# ---------------------
# 4. Update function
# ---------------------
def update_patient_data(patient: Patient):
    global db
    for idx, p in enumerate(db):
        if p.patient_id == patient.patient_id:
            db[idx] = patient
            print(f"Updated patient: {patient.name} with ID: {patient.patient_id}")
            return
    print(f"Patient ID '{patient.patient_id}' not found. Use insert instead.")

# ---------------------
# 5. Save/Load to/from JSON
# ---------------------
def save_db_to_json(filepath="patients.json"):
    with open(filepath, 'w') as f:
        json.dump([p.model_dump() for p in db], f, indent=2)
    print("Database saved to JSON.")

def load_db_from_json(filepath="patients.json"):
    global db
    try:
        with open(filepath, 'r') as f:
            data = json.load(f)
            db = [Patient(**p) for p in data]
        print("Database loaded from JSON.")
    except FileNotFoundError:
        print("No existing database found.")
    except ValidationError as e:
        print("Validation error while loading:", e)

# ---------------------
# 6. Test it
# ---------------------
try:
    load_db_from_json()

    patient1 = Patient(
        patient_id='P001',
        name='Rashed',
        age=29,
        weight=55,
        married=True,
        allergies=['Dust', 'Pollen'],
        contact_info={'phone': '+492648973', 'email': 'abcrashed@gmail.com'}
    )

    patient2 = Patient(
        patient_id='P002',
        name='Rashed',
        age=40,
        weight=70,
        married=True,
        allergies=['Pollen'],
        contact_info={'phone': '+49663882', 'email': 'rashed@gmail.com'}
    )

    insert_patient_data(patient1)
    insert_patient_data(patient2)
    insert_patient_data(patient1)  # Duplicate test

    # Update
    patient1_updated = Patient(
        patient_id='P001',
        name='Rashed',
        age=30,
        weight=56.7,
        married=True,
        allergies=['Dust', 'Pollen'],
        contact_info={'phone': '+492648973', 'email': 'abcrashed@gmail.com'}
    )
    update_patient_data(patient1_updated)

    save_db_to_json()

except ValidationError as e:
    print("Validation Error:", e)
Database loaded from JSON.
Patient ID 'P001' already exists. Use update instead.
Patient ID 'P002' already exists. Use update instead.
Patient ID 'P001' already exists. Use update instead.
Updated patient: Rashed with ID: P001
Database saved to JSON.
# ---------------------
# 7. Show database
# ---------------------
print("\nCurrent Database (in-memory):")

Current Database (in-memory):
for patient in db:
    print(patient.model_dump())
{'patient_id': 'P001', 'name': 'Rashed', 'age': 30, 'weight': 56.7, 'married': True, 'allergies': ['Dust', 'Pollen'], 'contact_info': {'phone': '+492648973', 'email': 'abcrashed@gmail.com'}}
{'patient_id': 'P002', 'name': 'Rashed', 'age': 40, 'weight': 70.0, 'married': True, 'allergies': ['Pollen'], 'contact_info': {'phone': '+49663882', 'email': 'rashed@gmail.com'}}

Why did not we use list and dict though? Because, we could make sure that the fields are list and string, but we could not check the data types inside those list or dict. That’s why we used 2-step validation using List[str] and Dict[str, str].

We could make our model more flexible. For example, not every patient will have allergies, but that field is required now! Let’s work around that.

Making Fields Optional and Adding Validation

In real-world applications, not all fields are required. Let’s make our model more realistic by adding optional fields and custom validation:

from pydantic import BaseModel, ValidationError, Field, validator
from typing import List, Dict, Optional
import json
from datetime import datetime

# ---------------------
# 1. Enhanced Patient model with optional fields and validation
# ---------------------
class Patient(BaseModel):
    patient_id: str = Field(..., min_length=4, max_length=10, description="Unique patient identifier")
    name: str = Field(..., min_length=2, max_length=50, description="Patient full name")
    age: int = Field(..., ge=0, le=150, description="Patient age in years")
    weight: float = Field(..., gt=0, le=500, description="Patient weight in kg")
    height: Optional[float] = Field(None, gt=0, le=300, description="Patient height in cm")
    married: bool = False  # Default value
    allergies: Optional[List[str]] = Field(default=[], description="List of known allergies")
    contact_info: Dict[str, str] = Field(default_factory=dict, description="Contact information")
    emergency_contact: Optional[Dict[str, str]] = None
    blood_type: Optional[str] = Field(None, regex=r'^(A|B|AB|O)[+-]
, description="Blood type (e.g., A+, O-, AB+)")
    
    # Custom validator for name formatting
    @validator('name')
    def name_must_not_be_empty_or_just_spaces(cls, v):
        if not v.strip():
            raise ValueError('Name cannot be empty or just spaces')
        return v.strip().title()  # Capitalize properly
    
    # Custom validator for phone number in contact_info
    @validator('contact_info')
    def validate_contact_info(cls, v):
        if 'phone' in v:
            phone = v['phone']
            # Simple phone validation (starts with + and has digits)
            if not phone.startswith('+') or not phone[1:].replace('-', '').replace(' ', '').isdigit():
                raise ValueError('Phone number must start with + and contain valid digits')
        return v
    
    # Calculate BMI if height is provided
    def calculate_bmi(self) -> Optional[float]:
        if self.height:
            height_m = self.height / 100  # Convert cm to meters
            return round(self.weight / (height_m ** 2), 2)
        return None
    
    # Check if patient is adult
    def is_adult(self) -> bool:
        return self.age >= 18
    
    # Get formatted patient info
    def get_summary(self) -> str:
        bmi = self.calculate_bmi()
        bmi_str = f", BMI: {bmi}" if bmi else ""
        allergies_str = f", Allergies: {', '.join(self.allergies)}" if self.allergies else ", No known allergies"
        return f"{self.name} (ID: {self.patient_id}), Age: {self.age}, Weight: {self.weight}kg{bmi_str}{allergies_str}"

# ---------------------
# 2. Enhanced database operations
# ---------------------
db: List[Patient] = []

def insert_patient_data(patient: Patient):
    global db
    if any(p.patient_id == patient.patient_id for p in db):
        print(f"❌ Patient ID '{patient.patient_id}' already exists. Use update instead.")
        return False
    db.append(patient)
    print(f"βœ… Inserted patient: {patient.get_summary()}")
    return True

def update_patient_data(patient: Patient):
    global db
    for idx, p in enumerate(db):
        if p.patient_id == patient.patient_id:
            db[idx] = patient
            print(f"βœ… Updated patient: {patient.get_summary()}")
            return True
    print(f"❌ Patient ID '{patient.patient_id}' not found. Use insert instead.")
    return False

def find_patient_by_id(patient_id: str) -> Optional[Patient]:
    for patient in db:
        if patient.patient_id == patient_id:
            return patient
    return None

def list_all_patients():
    if not db:
        print("πŸ“­ No patients in database.")
        return
    
    print(f"\nπŸ‘₯ All Patients ({len(db)} total):")
    print("-" * 80)
    for patient in db:
        print(f"πŸ₯ {patient.get_summary()}")
        if patient.blood_type:
            print(f"   🩸 Blood Type: {patient.blood_type}")
        if patient.contact_info:
            contact_str = ", ".join([f"{k}: {v}" for k, v in patient.contact_info.items()])
            print(f"   πŸ“ž Contact: {contact_str}")
        print()

def get_patients_by_age_range(min_age: int, max_age: int) -> List[Patient]:
    return [p for p in db if min_age <= p.age <= max_age]

def get_patients_with_allergies() -> List[Patient]:
    return [p for p in db if p.allergies]

# ---------------------
# 3. Test the enhanced system
# ---------------------
print("πŸ₯ Testing Enhanced Patient Management System")
print("=" * 50)

try:
    # Test 1: Valid patient with all fields
    print("\nπŸ§ͺ Test 1: Complete patient record")
    patient1 = Patient(
        patient_id='P001',
        name='   rashed uzzaman   ',  # Will be cleaned and capitalized
        age=29,
        weight=65.5,
        height=175,
        married=True,
        allergies=['Dust', 'Pollen', 'Cats'],
        contact_info={'phone': '+49-123-456789', 'email': 'rashed@email.com'},
        emergency_contact={'name': 'Jane Doe', 'phone': '+49-987-654321'},
        blood_type='O+'
    )
    insert_patient_data(patient1)
    print(f"   BMI: {patient1.calculate_bmi()}")
    print(f"   Adult: {patient1.is_adult()}")
    
    # Test 2: Minimal patient record (using defaults)
    print("\nπŸ§ͺ Test 2: Minimal patient record")
    patient2 = Patient(
        patient_id='P002',
        name='Alice Johnson',
        age=35,
        weight=58.2
    )
    insert_patient_data(patient2)
    
    # Test 3: Child patient
    print("\nπŸ§ͺ Test 3: Child patient")
    patient3 = Patient(
        patient_id='P003',
        name='Bobby Smith',
        age=12,
        weight=40.0,
        height=150,
        allergies=['Peanuts'],
        contact_info={'phone': '+49-555-123456'},
        blood_type='A-'
    )
    insert_patient_data(patient3)
    print(f"   Adult: {patient3.is_adult()}")
    
    # Test 4: Try to insert duplicate
    print("\nπŸ§ͺ Test 4: Duplicate insertion attempt")
    insert_patient_data(patient1)
    
    # Test 5: Update patient
    print("\nπŸ§ͺ Test 5: Update patient weight")
    patient1_updated = Patient(
        patient_id='P001',
        name='Rashed Uzzaman',
        age=30,  # Birthday!
        weight=67.0,  # Gained weight
        height=175,
        married=True,
        allergies=['Dust', 'Pollen'],  # No longer allergic to cats!
        contact_info={'phone': '+49-123-456789', 'email': 'rashed.new@email.com'},
        blood_type='O+'
    )
    update_patient_data(patient1_updated)
    
except ValidationError as e:
    print(f"❌ Validation Error: {e}")

# Display all patients
list_all_patients()

# Query examples
print("\nπŸ” Query Examples:")
print("-" * 30)
adults = [p for p in db if p.is_adult()]
print(f"πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ Adult patients: {len(adults)}")

patients_with_allergies = get_patients_with_allergies()
print(f"🀧 Patients with allergies: {len(patients_with_allergies)}")
for p in patients_with_allergies:
    print(f"   - {p.name}: {', '.join(p.allergies)}")

young_adults = get_patients_by_age_range(18, 30)
print(f"πŸ§‘ Young adults (18-30): {len(young_adults)}")
for p in young_adults:
    print(f"   - {p.name} ({p.age} years old)")
unterminated string literal (detected at line 19) (<string>, line 19)

Advanced Validation with Custom Validators

Now let’s see what happens when we try to insert invalid data. Pydantic will catch these errors and give us helpful messages:

print("\n🚨 Testing Validation Errors")
print("=" * 40)

# Test invalid data scenarios
test_cases = [
    {
        'name': 'Invalid Age Test',
        'data': {'patient_id': 'P999', 'name': 'Test Patient', 'age': -5, 'weight': 70},
        'expected_error': 'Age cannot be negative'
    },
    {
        'name': 'Invalid Weight Test', 
        'data': {'patient_id': 'P998', 'name': 'Test Patient', 'age': 25, 'weight': 0},
        'expected_error': 'Weight must be greater than 0'
    },
    {
        'name': 'Invalid Blood Type Test',
        'data': {'patient_id': 'P997', 'name': 'Test Patient', 'age': 25, 'weight': 70, 'blood_type': 'XYZ'},
        'expected_error': 'Invalid blood type format'
    },
    {
        'name': 'Invalid Phone Number Test',
        'data': {'patient_id': 'P996', 'name': 'Test Patient', 'age': 25, 'weight': 70, 'contact_info': {'phone': 'invalid-phone'}},
        'expected_error': 'Invalid phone number format'
    },
    {
        'name': 'Empty Name Test',
        'data': {'patient_id': 'P995', 'name': '   ', 'age': 25, 'weight': 70},
        'expected_error': 'Name cannot be empty'
    }
]

for test in test_cases:
    print(f"\nπŸ§ͺ {test['name']}:")
    try:
        invalid_patient = Patient(**test['data'])
        print(f"   ⚠️ Unexpectedly succeeded: {invalid_patient.name}")
    except ValidationError as e:
        print(f"   βœ… Correctly caught error: {str(e).split('\n')[0]}")
    except Exception as e:
        print(f"   ❌ Unexpected error type: {type(e).__name__}: {e}")
f-string expression part cannot include a backslash (<string>, line 39)

Real-World Data Processing with Pydantic

Let’s simulate reading patient data from a CSV file and using Pydantic to validate and clean it:

import csv
from io import StringIO

# Simulate CSV data (in real world, you'd read from a file)
csv_data = """patient_id,name,age,weight,height,married,allergies,phone,email,blood_type
P101,john doe,25,70.5,180,true,"Dust,Pollen",+49-111-222333,john@email.com,A+
P102,JANE SMITH,35,65.0,,false,Peanuts,+49-444-555666,jane@email.com,O-
P103,bob wilson,17,55.2,165,false,,+49-777-888999,bob@email.com,
P104,invalid patient,-5,0,200,maybe,Bad Data,invalid-phone,not-an-email,XYZ
P105,mary johnson,45,72.3,168,true,"Shellfish,Latex",+49-123-987654,mary@email.com,B+
"""

def process_csv_data(csv_content: str):
    """Process CSV data and create Patient objects with validation"""
    successful_patients = []
    failed_records = []
    
    csv_reader = csv.DictReader(StringIO(csv_content))
    
    for row_num, row in enumerate(csv_reader, 1):
        try:
            # Clean and prepare data
            processed_row = {
                'patient_id': row['patient_id'].strip(),
                'name': row['name'].strip(),
                'age': int(row['age']),
                'weight': float(row['weight']),
                'married': row['married'].lower() in ['true', '1', 'yes'],
            }
            
            # Handle optional fields
            if row['height'].strip():
                processed_row['height'] = float(row['height'])
            
            # Process allergies (split by comma if present)
            if row['allergies'].strip():
                processed_row['allergies'] = [a.strip() for a in row['allergies'].split(',')]
            
            # Build contact info
            contact_info = {}
            if row['phone'].strip():
                contact_info['phone'] = row['phone'].strip()
            if row['email'].strip():
                contact_info['email'] = row['email'].strip()
            if contact_info:
                processed_row['contact_info'] = contact_info
            
            # Blood type
            if row['blood_type'].strip():
                processed_row['blood_type'] = row['blood_type'].strip()
            
            # Create Patient object (this will validate everything)
            patient = Patient(**processed_row)
            successful_patients.append(patient)
            print(f"βœ… Row {row_num}: Successfully processed {patient.name}")
            
        except ValidationError as e:
            error_msg = str(e).split('\n')[0]  # Get first error line
            failed_records.append({'row': row_num, 'data': row, 'error': error_msg})
            print(f"❌ Row {row_num}: Validation failed - {error_msg}")
        except Exception as e:
            failed_records.append({'row': row_num, 'data': row, 'error': str(e)})
            print(f"❌ Row {row_num}: Processing failed - {e}")
    
    return successful_patients, failed_records

print("\nπŸ“Š Processing CSV Data with Pydantic Validation")

πŸ“Š Processing CSV Data with Pydantic Validation
print("=" * 55)
=======================================================
successful, failed = process_csv_data(csv_data)
βœ… Row 1: Successfully processed john doe
βœ… Row 2: Successfully processed JANE SMITH
❌ Row 3: Validation failed - 1 validation error for Patient
βœ… Row 4: Successfully processed invalid patient
βœ… Row 5: Successfully processed mary johnson
print(f"\nπŸ“ˆ Summary:")

πŸ“ˆ Summary:
print(f"βœ… Successfully processed: {len(successful)} patients")
βœ… Successfully processed: 4 patients
print(f"❌ Failed to process: {len(failed)} records")
❌ Failed to process: 1 records
if successful:
    print(f"\nπŸ‘₯ Successfully Imported Patients:")
    for patient in successful:
        print(f"   πŸ₯ {patient.get_summary()}")
AttributeError: 'Patient' object has no attribute 'get_summary'
if failed:
    print(f"\n⚠️ Failed Records (need manual review):")
    for failure in failed:
        print(f"   Row {failure['row']}: {failure['data']['name']} - {failure['error']}")

⚠️ Failed Records (need manual review):
   Row 3: bob wilson - 1 validation error for Patient

Saving and Loading with JSON Schema

Pydantic can also generate JSON schemas and work seamlessly with JSON data:

import json
from datetime import datetime

# Generate JSON schema for our Patient model
patient_schema = Patient.model_json_schema()

print("πŸ“‹ Patient Model JSON Schema:")
πŸ“‹ Patient Model JSON Schema:
print("=" * 35)
===================================
print(json.dumps(patient_schema, indent=2)[:500] + "...\n(truncated)")
{
  "properties": {
    "patient_id": {
      "title": "Patient Id",
      "type": "string"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "age": {
      "title": "Age",
      "type": "integer"
    },
    "weight": {
      "title": "Weight",
      "type": "number"
    },
    "married": {
      "title": "Married",
      "type": "boolean"
    },
    "allergies": {
      "items": {
        "type": "string"
      },
      "title": "Allergies",
      "type": "array"
   ...
(truncated)
# Save all our patients to JSON with timestamp
def save_patients_with_metadata(filename: str = "patients_database.json"):
    data = {
        'timestamp': datetime.now().isoformat(),
        'total_patients': len(db),
        'schema_version': '1.0',
        'patients': [patient.model_dump() for patient in db]
    }
    
    with open(filename, 'w') as f:
        json.dump(data, f, indent=2)
    
    print(f"πŸ’Ύ Saved {len(db)} patients to {filename}")
    return filename

# Load patients from JSON with validation
def load_patients_with_validation(filename: str = "patients_database.json"):
    global db
    try:
        with open(filename, 'r') as f:
            data = json.load(f)
        
        print(f"πŸ“– Loading database from {filename}")
        print(f"   πŸ“… Saved on: {data['timestamp']}")
        print(f"   πŸ‘₯ Expected patients: {data['total_patients']}")
        
        # Validate and load each patient
        loaded_patients = []
        for patient_data in data['patients']:
            try:
                patient = Patient(**patient_data)
                loaded_patients.append(patient)
            except ValidationError as e:
                print(f"   ❌ Failed to load patient {patient_data.get('name', 'Unknown')}: {e}")
        
        db = loaded_patients
        print(f"   βœ… Successfully loaded {len(db)} patients")
        
    except FileNotFoundError:
        print(f"❌ File {filename} not found")
    except json.JSONDecodeError as e:
        print(f"❌ Invalid JSON in {filename}: {e}")
    except Exception as e:
        print(f"❌ Error loading database: {e}")

# Save current database
filename = save_patients_with_metadata()
πŸ’Ύ Saved 2 patients to patients_database.json
# Clear database and reload to test
original_db = db.copy()
db = []
print(f"\nπŸ—‘οΈ Cleared database (now has {len(db)} patients)")

πŸ—‘οΈ Cleared database (now has 0 patients)
# Reload
load_patients_with_validation(filename)
πŸ“– Loading database from patients_database.json
   πŸ“… Saved on: 2025-09-26T21:19:32.325143
   πŸ‘₯ Expected patients: 2
   βœ… Successfully loaded 2 patients
print(f"πŸ”„ Reloaded database (now has {len(db)} patients)")
πŸ”„ Reloaded database (now has 2 patients)
# Verify data integrity
print(f"\nπŸ” Data Integrity Check:")

πŸ” Data Integrity Check:
if len(original_db) == len(db):
    print("βœ… Patient count matches")
    for orig, loaded in zip(original_db, db):
        if orig.model_dump() == loaded.model_dump():
            print(f"   βœ… {orig.name} data matches perfectly")
        else:
            print(f"   ❌ {orig.name} data mismatch detected")
else:
    print(f"❌ Patient count mismatch: original {len(original_db)}, loaded {len(db)}")
βœ… Patient count matches
   βœ… Rashed data matches perfectly
   βœ… Rashed data matches perfectly

Summary: The Power of Pydantic

Throughout this journey, we’ve seen how Pydantic transforms our approach to data handling:

🎯 Key Benefits We’ve Demonstrated:

  1. πŸ›‘οΈ Automatic Validation: No more manual type checking - Pydantic does it automatically
  2. πŸ”„ Type Coercion: Smart conversion of compatible types (string β€œ55” β†’ float 55.0)
  3. πŸ“ Clear Error Messages: Helpful validation errors that pinpoint exactly what’s wrong
  4. 🎨 Clean Code: Models serve as documentation and enforce data contracts
  5. πŸ”§ Flexibility: Optional fields, default values, and custom validators
  6. 🌐 JSON Integration: Seamless serialization/deserialization with validation
  7. πŸ“Š Real-world Ready: Handles complex data scenarios like CSV imports

πŸš€ From Simple to Sophisticated:

  • Started with basic type hints (limited enforcement)
  • Added manual validation (not scalable)
  • Introduced Pydantic models (automatic validation)
  • Enhanced with optional fields and custom validators
  • Integrated with real data processing (CSV, JSON)
  • Built a complete data management system

πŸ’‘ When to Use Pydantic:

  • API Development: Validate request/response data
  • Data Processing: Clean and validate CSV/JSON imports
  • Configuration Management: Validate application settings
  • Database Models: Ensure data integrity before persistence
  • Microservices: Validate inter-service communication

Pydantic transforms unreliable, error-prone data handling into robust, self-documenting, and maintainable code. It’s not just about validation - it’s about building confidence in your data throughout your entire application! πŸŽ‰

Citation

BibTeX citation:
@online{rasheduzzaman2025,
  author = {Md Rasheduzzaman},
  title = {Panda {Dataframe,} Scypy, Numpy, Data Typing, Etc.},
  date = {2025-09-26},
  langid = {en},
  abstract = {Pydantic}
}
For attribution, please cite this work as:
Md Rasheduzzaman. 2025. β€œPanda Dataframe, Scypy, Numpy, Data Typing, Etc.” September 26, 2025.

πŸ’¬ Have thoughts or questions? Join the discussion below using your GitHub account!

You can edit or delete your own comments. Reactions like πŸ‘ ❀️ πŸš€ are also supported.