AWS S3 with Python — Complete boto3 Tutorial (Presigned URLs, Multipart, Events)

Learn AWS S3 with Python boto3. Covers bucket creation, file upload/download, presigned URLs, multipart uploads, S3 events, and cost optimization strategies.

Introduction

Amazon S3 (Simple Storage Service) is the most widely used object storage service in the cloud. Combined with Python's boto3 library, you can build powerful applications for file storage, backup, data lakes, and content delivery. This guide covers everything from basic uploads to advanced patterns like presigned URLs, multipart uploads, and event-driven architectures.

Why S3 Matters

S3 stores over 100 trillion objects worldwide. Whether you're building a web app that needs file uploads, a data pipeline that processes logs, or a backup system, S3 is the industry standard. Understanding S3 deeply helps you build scalable, cost-effective solutions.

Core Concepts

  • Buckets — Top-level containers for objects. Bucket names must be globally unique. Choose regions close to your users for lower latency.
  • Objects — Files stored in S3, consisting of data + metadata. Objects can range from bytes to 5TB in size.
  • Storage Classes — S3 Standard (frequent access), S3 Intelligent-Tiering (auto-optimizes), S3 Glacier (archival, cheaper).
  • Access Control — IAM policies, bucket policies, ACLs, and presigned URLs control who can access your data.
  • Event Notifications — Trigger Lambda functions, SNS topics, or SQS queues when objects are created or deleted.

Setup and Configuration

Install boto3 and configure credentials securely:

# Install boto3
pip install boto3

# Configure credentials (NEVER hardcode in production)
# Option 1: AWS CLI (recommended for development)
aws configure

# Option 2: Environment variables
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_DEFAULT_REGION="us-east-1"

# Option 3: IAM role (EC2, Lambda, ECS - recommended for production)
# No credentials needed - automatically provided

Basic Operations

Creating Buckets

import boto3
from botocore.exceptions import ClientError

s3_client = boto3.client('s3')

def create_bucket(bucket_name: str, region: str = 'us-east-1') -> bool:
    """Create an S3 bucket with the specified name and region."""
    try:
        if region == 'us-east-1':
            s3_client.create_bucket(Bucket=bucket_name)
        else:
            s3_client.create_bucket(
                Bucket=bucket_name,
                CreateBucketConfiguration={'LocationConstraint': region}
            )
        return True
    except ClientError as e:
        print(f"Error creating bucket: {e}")
        return False

Uploading Files

def upload_file(file_path: str, bucket: str, object_name: str = None) -> bool:
    """Upload a file to an S3 bucket with progress callback."""
    if object_name is None:
        object_name = file_path.split('/')[-1]

    s3_client = boto3.client('s3')

    try:
        s3_client.upload_file(file_path, bucket, object_name)
        return True
    except ClientError as e:
        print(f"Upload error: {e}")
        return False

# Upload with metadata
s3_client.upload_file(
    'report.pdf',
    'my-bucket',
    'reports/2026/report.pdf',
    ExtraArgs={
        'ContentType': 'application/pdf',
        'Metadata': {'uploaded-by': 'my-app'}
    }
)

Downloading Files

def download_file(bucket: str, object_name: str, file_path: str) -> bool:
    """Download a file from S3."""
    s3_client = boto3.client('s3')
    try:
        s3_client.download_file(bucket, object_name, file_path)
        return True
    except ClientError as e:
        print(f"Download error: {e}")
        return False

# Download as bytes (for processing in memory)
import io
def download_as_bytes(bucket: str, object_name: str) -> bytes:
    response = s3_client.get_object(Bucket=bucket, Key=object_name)
    return response['Body'].read()

Presigned URLs

Presigned URLs grant temporary access to private objects without making them public. Perfect for user uploads, private downloads, or time-limited sharing.

from datetime import timedelta

def generate_presigned_upload_url(
    bucket: str,
    object_name: str,
    expiration_minutes: int = 15
) -> str:
    """Generate a presigned URL for uploading."""
    return s3_client.generate_presigned_url(
        'put_object',
        Params={'Bucket': bucket, 'Key': object_name},
        ExpiresIn=expiration_minutes * 60
    )

def generate_presigned_download_url(
    bucket: str,
    object_name: str,
    expiration_minutes: int = 15
) -> str:
    """Generate a presigned URL for downloading."""
    return s3_client.generate_presigned_url(
        'get_object',
        Params={'Bucket': bucket, 'Key': object_name},
        ExpiresIn=expiration_minutes * 60
    )

# Usage: Return this URL to the client for direct upload
upload_url = generate_presigned_upload_url('my-bucket', 'user-uploads/file.pdf')
print(f"Upload URL (valid for 15 min): {upload_url}")

Multipart Uploads

For files larger than 100MB, use multipart uploads for better reliability and resumability.

def multipart_upload(file_path: str, bucket: str, object_name: str, part_size: int = 10*1024*1024):
    """Upload large files using multipart upload (10MB parts)."""
    s3_client = boto3.client('s3')

    # Initiate
    response = s3_client.create_multipart_upload(Bucket=bucket, Key=object_name)
    upload_id = response['UploadId']

    parts = []
    part_number = 1

    try:
        with open(file_path, 'rb') as f:
            while chunk := f.read(part_size):
                # Upload each part
                part_response = s3_client.upload_part(
                    Bucket=bucket,
                    Key=object_name,
                    PartNumber=part_number,
                    UploadId=upload_id,
                    Body=chunk
                )
                parts.append({
                    'PartNumber': part_number,
                    'ETag': part_response['ETag']
                })
                part_number += 1

        # Complete the upload
        s3_client.complete_multipart_upload(
            Bucket=bucket,
            Key=object_name,
            UploadId=upload_id,
            MultipartUpload={'Parts': parts}
        )
        print(f"Multipart upload completed: {object_name}")

    except Exception as e:
        # Abort on failure
        s3_client.abort_multipart_upload(
            Bucket=bucket,
            Key=object_name,
            UploadId=upload_id
        )
        raise e

S3 Event Notifications

Trigger Lambda functions when files are uploaded. Common use cases: image resizing, virus scanning, data processing.

def configure_s3_event_notification(bucket: str, lambda_arn: str, prefix: str = ''):
    """Configure S3 to trigger Lambda on new object creation."""
    s3_client = boto3.client('s3')

    s3_client.put_bucket_notification_configuration(
        Bucket=bucket,
        NotificationConfiguration={
            'LambdaFunctionConfigurations': [
                {
                    'LambdaFunctionArn': lambda_arn,
                    'Events': ['s3:ObjectCreated:*'],
                    'Filter': {
                        'Key': {
                            'FilterRules': [
                                {'Name': 'prefix', 'Value': prefix}
                            ]
                        }
                    }
                }
            ]
        }
    )
    print(f"Configured event notification for bucket: {bucket}")

Cost Optimization

  • Storage Classes — Use S3 Intelligent-Tiering for unpredictable access patterns (auto-moves to cheaper tiers). Use S3 Glacier for archives accessed less than once per quarter.
  • Lifecycle Policies — Automatically transition old files to cheaper storage or delete temporary files.
  • Compression — Compress files before upload. Use formats like Parquet for data files (up to 80% smaller).
  • Transfer Acceleration — Enable for global uploads (uses CloudFront edge locations), but costs extra.
  • Request Optimization — Batch operations when possible. Use S3 Select to retrieve only needed data from large files.

Security Best Practices

  • Block Public Access — Keep buckets private by default. Use presigned URLs for temporary sharing.
  • Encryption — Enable SSE-S3 (AWS-managed keys) or SSE-KMS (customer-managed keys) for server-side encryption.
  • IAM Least Privilege — Grant only required permissions (e.g., s3:GetObject for read-only access).
  • Versioning — Enable bucket versioning to recover from accidental deletions or overwrites.
  • Access Logging — Enable S3 server access logging to track all requests for audit purposes.

Common Errors and Solutions

Error Cause Solution
AccessDenied Missing IAM permissions or bucket policy Check IAM role/policy, verify bucket permissions
NoSuchKey Object doesn't exist or wrong key path Verify object key (case-sensitive, check prefixes)
EntityTooLarge File exceeds single-upload limit (5GB) Use multipart upload for files >100MB
SlowDown Too many requests per second Implement exponential backoff, increase prefixes for parallelism

Frequently Asked Questions

What is the difference between S3 Standard and S3 Intelligent-Tiering?

S3 Standard is for frequently accessed data with predictable access patterns. S3 Intelligent-Tiering automatically moves objects between frequent and infrequent access tiers based on usage, saving costs without performance impact. Use Intelligent-Tiering when access patterns are unpredictable.

How do I secure sensitive data in S3?

Enable server-side encryption (SSE-S3 or SSE-KMS), block all public access at the bucket level, use IAM policies with least privilege, enable versioning for recovery, and use presigned URLs for temporary access. For compliance, enable CloudTrail logging and S3 access logging.

When should I use presigned URLs vs making objects public?

Always use presigned URLs for user-specific content, temporary sharing, or when you need time-limited access. Never make buckets public unless serving static website assets. Presigned URLs expire automatically, reducing security risks.

What is the maximum file size I can upload to S3?

Single PUT uploads support up to 5GB. For files larger than 100MB (recommended threshold), use multipart uploads which support up to 5TB. Multipart uploads also enable resumability and parallel uploads for better performance.

How can I reduce my S3 costs?

Use S3 Intelligent-Tiering for unpredictable access, S3 Glacier for archives (>90 days), enable compression, set up lifecycle policies to auto-transition or delete old files, use S3 Select for querying large files, and analyze costs with S3 Storage Lens.

How do I handle concurrent uploads to the same object?

S3 uses optimistic concurrency - the last write wins. For safe concurrent updates, use unique object keys (e.g., timestamps or UUIDs), implement application-level locking, or use S3 Object Lock for compliance scenarios.

What happens if a multipart upload fails midway?

Incomplete multipart uploads continue charging storage until completed or aborted. Use lifecycle rules to abort incomplete uploads after 7 days. Always implement error handling to call abort_multipart_upload() on failure.

Can I trigger code automatically when files are uploaded?

Yes, configure S3 event notifications to trigger AWS Lambda functions, SNS topics, or SQS queues. Common patterns: resize images on upload, scan for viruses, extract metadata, or start data processing pipelines.