Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr4135 :: Mining the web

In this episode I talk a bit about a project I have been working on to index the web.

<< First, < Previous, , Latest >>

Hosted by Cedric De Vroey on 2024-06-07 is flagged as Explicit and is released under a CC-BY-SA license.
docker, redis, hacking, mongodb, scraping, dns, certificate-transparency. 2.

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:15:14

general.

I don't have shownotes for this one. Sorry for that.

Please refer to the transcript for more information.


Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2024-06-12 12:00:27 by norrist

Clever use of transparency data

Using the transparency logs is a clever way to get a list of active domains. Lets Encrypt certs expire every 90 days, so the domains are more likely to be active. I don't have any suggestions for managing the data, but it sounds like a solvable problem. Can you post more about how you are parsing the transparency logs?
Comment #2 posted on 2024-06-14 11:34:02 by Henrik Hemrin

Amazing project

What an amazing hacker project. I'm impressed and nice to learn about.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?