downloaders
lacuna.io.downloaders
¶
Downloaders subpackage for source-specific download logic.
This module provides downloader implementations for various data sources: - Harvard Dataverse (for GSP1000) - Figshare (for dTOR985) - GitHub Releases (for HCP1065)
Each downloader handles authentication, rate limiting, and error handling specific to its data source.
ConnectomeSource
dataclass
¶
Configuration for a fetchable connectome source.
Source code in src/lacuna/io/downloaders/base.py
article_id = None
class-attribute
instance-attribute
¶
Figshare article ID for API-based downloads.
citation = ''
class-attribute
instance-attribute
¶
Citation text for this connectome dataset.
dataverse_server = 'https://dataverse.harvard.edu'
class-attribute
instance-attribute
¶
Dataverse server URL.
default_batches = 10
class-attribute
instance-attribute
¶
Default number of HDF5 batches (functional only).
description
instance-attribute
¶
User-facing description of the connectome.
display_name
instance-attribute
¶
Human-readable name (e.g., 'GSP1000 Functional Connectome').
download_url = None
class-attribute
instance-attribute
¶
Direct download URL for Figshare files (deprecated, use article_id).
estimated_size_gb = 0.0
class-attribute
instance-attribute
¶
Estimated download size in GB for user information.
mask_url = None
class-attribute
instance-attribute
¶
URL to download brain mask if required.
n_subjects = 0
class-attribute
instance-attribute
¶
Number of subjects in the connectome.
name
instance-attribute
¶
Unique identifier (e.g., 'gsp1000', 'dtor985').
persistent_id = None
class-attribute
instance-attribute
¶
DOI for Dataverse datasets (e.g., 'doi:10.7910/DVN/ILXIKS').
requires_mask = False
class-attribute
instance-attribute
¶
Whether brain mask is needed for processing.
source_type
instance-attribute
¶
Download source requiring specific authentication/handling.
space = 'MNI152NLin6Asym'
class-attribute
instance-attribute
¶
Coordinate space.
type
instance-attribute
¶
Connectome type determining processing pipeline.
DataverseDownloader
¶
Bases: BaseDownloader
Downloader for Harvard Dataverse datasets.
Handles authentication via API key and supports resumable downloads with checksum verification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
ConnectomeSource
|
Configuration for the connectome source. |
required |
api_key
|
str
|
Dataverse API key. If not provided, will attempt to get from environment variable or config file. |
None
|
Raises:
| Type | Description |
|---|---|
AuthenticationError
|
If no API key is available. |
Source code in src/lacuna/io/downloaders/dataverse.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 | |
download(output_path, progress_callback=None, test_mode=False, skip_checksum=False)
¶
Download dataset files from Dataverse.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Path
|
Directory to download files to. |
required |
progress_callback
|
callable
|
Function called with FetchProgress updates. |
None
|
test_mode
|
bool
|
If True, downloads only 1 tarball for testing the full pipeline. Metadata files (JSON, TXT, masks) are always downloaded. |
False
|
skip_checksum
|
bool
|
If True, skip checksum verification. Use when server metadata is outdated and causes false checksum mismatches. |
False
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
List of downloaded file paths. |
Raises:
| Type | Description |
|---|---|
DownloadError
|
If download fails. |
AuthenticationError
|
If authentication fails. |
Source code in src/lacuna/io/downloaders/dataverse.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
FetchConfig
dataclass
¶
Configuration for a connectome fetch operation.
Source code in src/lacuna/io/downloaders/base.py
api_key = None
class-attribute
instance-attribute
¶
Dataverse API key (for GSP1000). Can also use DATAVERSE_API_KEY env var.
batches = 10
class-attribute
instance-attribute
¶
Number of HDF5 batch files for functional connectomes.
connectome
instance-attribute
¶
Connectome name to fetch (e.g., 'gsp1000', 'dtor985').
force = False
class-attribute
instance-attribute
¶
Overwrite existing files and registrations.
keep_original = True
class-attribute
instance-attribute
¶
Keep original downloaded files after processing.
output_dir
instance-attribute
¶
Directory for processed output files.
register = True
class-attribute
instance-attribute
¶
Automatically register connectome after processing.
register_name = None
class-attribute
instance-attribute
¶
Custom name for registration. Defaults to source name (e.g., 'GSP1000').
resume = True
class-attribute
instance-attribute
¶
Resume interrupted downloads.
from_cli_args(args)
classmethod
¶
Create config from CLI arguments.
Source code in src/lacuna/io/downloaders/base.py
get_api_key()
¶
Get API key from config, env var, or config file.
Source code in src/lacuna/io/downloaders/base.py
FetchProgress
dataclass
¶
Progress information for fetch operations.
Source code in src/lacuna/io/downloaders/base.py
bytes_total = 0
class-attribute
instance-attribute
¶
Total bytes for current download.
bytes_transferred = 0
class-attribute
instance-attribute
¶
Bytes transferred in current download.
current_file
instance-attribute
¶
Name of file currently being processed.
download_percent
property
¶
Current file download percentage.
files_completed
instance-attribute
¶
Number of files completed.
files_total
instance-attribute
¶
Total number of files to process.
message = ''
class-attribute
instance-attribute
¶
Human-readable status message.
percent_complete
property
¶
Overall percentage completion.
phase
instance-attribute
¶
Current operation phase.
FetchResult
dataclass
¶
Result of a connectome fetch operation.
Source code in src/lacuna/io/downloaders/base.py
connectome_name
instance-attribute
¶
Name of the fetched connectome.
download_time_seconds = 0.0
class-attribute
instance-attribute
¶
Time spent downloading.
duration_seconds = 0.0
class-attribute
instance-attribute
¶
Total operation time in seconds.
error = None
class-attribute
instance-attribute
¶
Error message if success=False.
output_dir
instance-attribute
¶
Directory containing processed files.
output_files = field(default_factory=list)
class-attribute
instance-attribute
¶
List of created output files.
processing_time_seconds = 0.0
class-attribute
instance-attribute
¶
Time spent processing.
register_name = None
class-attribute
instance-attribute
¶
Name used for registration, or None if not registered.
registered = False
class-attribute
instance-attribute
¶
Whether the connectome was registered.
success
instance-attribute
¶
Whether the operation completed successfully.
warnings = field(default_factory=list)
class-attribute
instance-attribute
¶
Non-fatal warnings encountered.
summary()
¶
Generate human-readable summary.
Source code in src/lacuna/io/downloaders/base.py
FigshareDownloader
¶
Bases: BaseDownloader
Downloader for Figshare files using authenticated API.
Uses Figshare API with authentication token to get download URLs that bypass AWS WAF protection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
ConnectomeSource
|
Configuration for the connectome source. |
required |
api_key
|
str
|
Figshare API key. If not provided, uses FIGSHARE_API_KEY env var. |
None
|
Source code in src/lacuna/io/downloaders/figshare.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | |
download(output_path, progress_callback=None)
¶
Download file from Figshare using authenticated API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Path
|
Directory to download files to. |
required |
progress_callback
|
callable
|
Function called with FetchProgress updates. |
None
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
List of downloaded file paths (single file for Figshare). |
Raises:
| Type | Description |
|---|---|
DownloadError
|
If download fails or API key is missing. |
Source code in src/lacuna/io/downloaders/figshare.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
GithubReleaseDownloader
¶
Bases: BaseDownloader
Downloader for files hosted on GitHub Releases.
No authentication is required — files are downloaded via plain HTTP GET.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
ConnectomeSource
|
Configuration for the connectome source. Must have |
required |
Source code in src/lacuna/io/downloaders/github.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
download(output_path, progress_callback=None)
¶
Download file from GitHub Releases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
Path
|
Directory to download files to. |
required |
progress_callback
|
callable
|
Function called with FetchProgress updates. |
None
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
List of downloaded file paths (single file). |
Raises:
| Type | Description |
|---|---|
DownloadError
|
If download fails or download_url is not configured. |
Source code in src/lacuna/io/downloaders/github.py
get_api_key(cli_key=None)
¶
Get API key using priority order: CLI > env var > config file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cli_key
|
str
|
API key provided via CLI argument. |
None
|
Returns:
| Type | Description |
|---|---|
str or None
|
The API key, or None if not found. |