X Tutup
Skip to content

update extract.py#182

Merged
urialon merged 1 commit intotech-srl:masterfrom
lidiancracy:update_extract_py
Sep 20, 2023
Merged

update extract.py#182
urialon merged 1 commit intotech-srl:masterfrom
lidiancracy:update_extract_py

Conversation

@lidiancracy
Copy link
Copy Markdown
Contributor

When I used process.sh to extract project data, I found that the project was too large to be extracted. As a result, I modified extract.py to read the dataset paths in batches. Moreover, some datasets might have errors that prevent them from being parsed (it doesn't throw an error but just hangs, which was quite perplexing). Therefore, I added a time constraint, and if it exceeds a certain duration without processing, it skips. I hope this can assist users dealing with large volumes of data.

…ng of projects instead of loading them all at once. During batch processing, I also incorporated timeout handling.
@lidiancracy lidiancracy changed the title update exteacr.py update extract.py Sep 20, 2023
@urialon urialon merged commit 77637c5 into tech-srl:master Sep 20, 2023
@urialon
Copy link
Copy Markdown
Collaborator

urialon commented Sep 20, 2023

Great, thank you @lidiancracy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

X Tutup