Sourcing with inurl. Search with inurl to research data and insights.
inurl is a basic foundational sourcing technique. It unlocks so many secrets of the internet. In this article I am going to focus using inurl to crawl mail archive websites.
Basics of sourcing with inurl
Go to google and type this:
The results might depend on your location but most likely you get the same.
The first result says
Email - Wikipedia in which the url has the word
The second result says
Sign in - Google Accounts and the url also has the word
The third result says
Webmail with a url
This strategy of “word in a url” is often used for SEO. One of the results says
Email marketing made simple and the url contains this string
beginners-guide-to-email-marketing with the word
That was all the theory you have to know about
inurl. Now let’s find the secrets of the internet.
Let me introduce you to my old friend
GNU Mailman. It’s an open source free software for managing mailing lists. These mailing lists are often published on the web. I hope I caught your attention by now.
The results show
1 - 100 of 5059213 matches and the top of the page shows links to a few mailing lists. One of them says
code-quality. Let’s explore.
It shows a list of
Messages by Thread and a search box. Can I type
Let’s go back and click on a thread
Disable QA for part of file.
It shows the back and forth conversation about this topic. You don’t see the email addresses but you can see their names. If you click on a name, it will show you all threads sent by that person.
I clicked on that person and clicked on one of their threads. I don’t quite understand the context of their conversation but I see part of their email address.
The replies back and forth show part of the
from header. It has their full name, their beginning and end of their email address, such as:
2017-09-08 First Last ([email protected]).
Let’s go back to the
Mail Archive search box and type
kubernetes. Oh yeah everyone is looking for someone that knows this. I studied this a bit. It’s not rocket science. Don’t know what the big fuss is about it.
I clicked on the first link. Another thread, another rabbit hole.
It says that they fixed some documentation about Kubernetes.
It has the full name and part of the email of the person that started the thread.
The url has these words
camel.apache.org. Not sure what they mean. Let’s see.
Google search: inurl:commits inurl:apache
It shows 258,000 results.
The first one is from
github. OK, I assume a lot of results are going to come from Github if they use the word
commits in the url. Let’s remove github. This shows 121,000 results.
inurl:commits inurl:apache -github.com
Now all the results have the word
apache in the url. It looks like they are all the same website
lists.apache.org and many say
Pony Mail or
Apache Mail Archives.
I opened the first result just to see. Looks like the threads are from a bot.
The top left of the website has a logo that says
Pony Mail. Clicked there and it says
Welcome to Pony Mail...pick a mailing list domain. Mmm so many, which one to pick.
It has a directory so I clicked on the letter
S. I am looking to see if there is a
Spark mailing list.
Yes there is one. It’s full of thread messages.
I clicked on the thread with the most replies.
Similar content with back and forth messages, including the header of the author that sent the message. It has their full name and partial email address.
The full address is displayed when someone replies and part of the previous content is shown. Such as this:
This website has so many lists. It shows
665 active lists with almost 5,000 people online.
From our previous experiment I saw that the Apache website has the word
lists in the url.
Try these experiments:
- inurl:lists inurl:archives
- inurl:lists gmail
- inurl:lists kubernetes
Let’s try this Google search
There are mixed results, not all of them are mailing lists but we can filter the noise. Change search to
inurl:lists kubernetes -stackoverflow.com -github.com.
One of the results is the mailing list for
Cloud Native Computing Foundation. Mixed results in this one.
Let’s modify the search to
inurl:lists kubernetes (archive | mailing)
Found a result that says
Mailing Lists - Jenkins. Jenkins is a technology used by DevOps. Navigating this result I found the mailing list is kept on Google Groups using this url
groups.google.com/g/jenkinsci-users. This one has 19,000 threads.
Similar to other results, each thread shows a conversation between users, and it displays their full name, followed by either a partial email or full email.
Keep searching with inurl
Keep trying out queries with
inurl. I found a mailing list from MIT, an old list from Apple developers, and many other interesting results.