This is a valid Atom 1.0 feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
... observed domains &amp; hostnames" /><published>2022-11-22T00:00:00+0 ...
^
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="https://urlscan.io/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://urlscan.io/blog/" rel="alternate" type="text/html" /><updated>2024-01-16T15:03:10+01:00</updated><id>https://urlscan.io/blog/feed.xml</id><title type="html">Blog - urlscan.io</title><subtitle>urlscan.io Blog - Announcements, Product News, Tutorials, Service Incidents</subtitle><author><name>urlscan.io</name></author><entry><title type="html">urlscan Pro — Inline Matching, System-Labels, User-Tags</title><link href="https://urlscan.io/blog/2024/01/10/inline-matching-system-labels-user-tags/" rel="alternate" type="text/html" title="urlscan Pro — Inline Matching, System-Labels, User-Tags" /><published>2024-01-10T00:00:00+01:00</published><updated>2024-01-10T00:00:00+01:00</updated><id>https://urlscan.io/blog/2024/01/10/inline-matching-system-labels-user-tags</id><content type="html" xml:base="https://urlscan.io/blog/2024/01/10/inline-matching-system-labels-user-tags/"><![CDATA[<p>As we welcome the year 2024, we wanted to update you on what we have been
working on in the second half of 2023 and announce the new features that are
launching today. These changes will have a profound impact for our customer
workflows and our own detection and classification abilities.</p>
<h3 id="saved-searches--a-success-story">Saved Searches — A success story</h3>
<p>When we launched <em>Saved Searches</em> in 2022 for our <code class="language-plaintext highlighter-rouge">scans</code> and <code class="language-plaintext highlighter-rouge">hostnames</code>
feeds, we did not envision how popular this feature would turn out to be.
Initially, Saved Searches were meant as a convenient way to bookmark a search
term within the urlscan Pro platform. The <em>Subscriptions</em> feature allowed
customers to receive notifications for any new items that matched their Saved
Searches.</p>
<p>Over the past year, the value of Saved Searches to customers has become
abundantly clear. Right now we manage more than 3000 Saved Searches and almost
1000 Subscriptions that have been created by our customers. Our subscription
notification system sends out over 5000 emails a day.</p>
<p>Saved Searches and Subscriptions became even more important when we launched
our <a href="/blog/2022/11/22/newly-observed-domains-and-hostnames/">Newly Observed Domains & Hostnames
Feed</a> in late 2022 and
<a href="/blog/2023/07/18/launching-urlscan-observe/">urlscan Observe</a> earlier this
year. Since then, many of our customers have set up Saved Searches to look for
domains impersonating their brand or targeting their workforce. Our feed
captures 2.5 million new domains and hostnames every day, so having an
expressive search ability to find and alert on interesting hits is crucial.</p>
<p>Today we are launching <em>major improvements</em> for Saved Searches, Subscriptions and
collaboration within the urlscan Pro platform.</p>
<!--more-->
<h3 id="saved-searches-before">Saved Searches Before</h3>
<p>Until now, Saved Searches and Subscriptions were basically just stored entries
in a database. In order to alert on a Saved Search, we would run its full query
term against the Search API every time we wanted to compile a list of hits.
While this approach would yield the correct results, it was not great for
multiple reasons:</p>
<ul>
<li>As the number of Saved Searches grew, we had to run thousands of queries,
sometimes as often as every few minutes.</li>
<li>Our own Visual Detects and Deletion rules used the same timed method for
querying, resulting in a delay of 1-2 minutes before a Visual Detect was
applied or a scan was deleted.</li>
<li>Customers could not easily determine if a specific result returned by Search
API did match any of their Saved Searches.</li>
</ul>
<p>We knew that we had to improve the implementation of Saved Searches
and Subscriptions if we wanted to continue scaling our platform and user base
and offer more advanced features.</p>
<p><img src="/blog/assets/images/saved-searches-old.png" alt="Saved Searches - Old Implementation" />
<em>Our old saved search workflow</em></p>
<h3 id="saved-searches-now--inline-matching">Saved Searches Now — Inline matching</h3>
<p>As part of this release, we have changed Saved Searches so that they are now
executed <strong>inline</strong> against new scans, hostnames and domains entering our
feeds. Elasticsearch calls this process <em>“percolation”</em>. Whenever urlscan
performs a new scan or finds a new domain, it is executed against
various types of stored rules in two passes:</p>
<ul>
<li>During the first pass, urlscan applies its
own block-and-delete rules, its brand-and-phishing detections, and its new
System Labels (see next paragraph).</li>
<li>During the second pass, we
execute the thousands of Saved Searches by our customers. Matching searches are
recorded in a special <code class="language-plaintext highlighter-rouge">meta</code> field in our Elasticsearch index. Customers can
then query this <code class="language-plaintext highlighter-rouge">meta</code> field with the ID of the search (or subscription) they
want to retrieve new results for.</li>
</ul>
<p><img src="/blog/assets/images/saved-searches-new.png" alt="Saved Searches - Inline Matching" />
<em>New inline matching pipeline</em></p>
<p>As part of a Saved Search, customers can also apply their own <em>custom tags</em> to
matching items. These can be arbitrary tags, and customers can control the
visibility of these tags within the urlscan Pro platform. User-supplied tags
will appear in the <code class="language-plaintext highlighter-rouge">usertags</code> fields in the Search API and Result API.</p>
<h4 id="new-customer-capabilities">New customer capabilities</h4>
<p>Executing and matching rules inline has a couple of interesting implications:</p>
<ul>
<li>Customers can query for all items that match multiple specific searches,
or that match one search but don’t match another one.</li>
<li>Customers can query for all items that match any search within a specific
subscription with a single query term, or any of their searches period.</li>
<li>When looking at search results, customers can determine whether any
particular result matched any of their Saved Searches.</li>
<li>Customers can use labels from the first matching pass in their
Saved Searches, e.g. to filter by system labels or brand detections.</li>
<li>Customers can share results of their Saved Searches with other users on the
urlscan Pro platform without exposing their actual search terms.</li>
<li>Complex queries could easily take multiple seconds to execute. With inline
matching, the expensive matching step is done during ingestion and the
customer can run a very efficient keyword query to retrieve all results.</li>
<li>Visual Searches can also be run inline, making them much more efficient and
instant. Inline Visual Searches will be more accurate than using the Visual
Search API which relies on <em>approximate</em> nearest-neighbour searches.</li>
</ul>
<h4 id="system-labels">System Labels</h4>
<p><strong>System Labels</strong> are classifications applied to scans and hostnames by dynamic
rules managed by urlscan. Labels will be returned in the <code class="language-plaintext highlighter-rouge">labels</code> field which
is a different namespace than user-defined tags. The idea for labels is that
these are stable and exhaustively documented, whereas user tags can be
arbitrary, short-lived and in some cases imprecise.</p>
<p>Going forward, we will curate labels covering common classification objectives that
our customers can use to include or exclude from their search results. Here
are some examples of labels for scans we plan on introducing:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">content.mature</code> - Page likely contains mature content.</li>
<li><code class="language-plaintext highlighter-rouge">content.opendir</code> - Page is showing an open directory.</li>
<li><code class="language-plaintext highlighter-rouge">site.parkeddomain</code> - Domain is currently parked / for sale.</li>
<li><code class="language-plaintext highlighter-rouge">site.takedown</code> - Site is showing a takedown / disabled-account notice.</li>
<li><code class="language-plaintext highlighter-rouge">tech.captcha</code> - Site employed a captcha prompt.</li>
<li><code class="language-plaintext highlighter-rouge">tech.captcha.waiting</code> - Site is showing an unsolved Captcha prompt.</li>
</ul>
<p>These are just some initial ideas and we know our customers will come up with
all kinds of creative ways they want to consume our data.</p>
<h4 id="additional-benefits">Additional Benefits</h4>
<p>With Inline Matching in place, we were able to improve various aspects of our
platform. For customers, the most visible changes include:</p>
<ul>
<li><em>Saved Searches</em> are sanity-checked before we allow them to be saved. Trying
to search fields which don’t exist will now throw an error.</li>
<li><em>Account-protection for scans</em> — Upon request, we can restrict access
to scan results to the owner of that account. (Feature available in
Enterprise and Ultimate).</li>
<li><em>Improved Brand Detection</em> — We will use the expressiveness of our
inline matching to craft more encompassing detection rules for our brand- and
phishing-feed.</li>
<li><em>Instant Visual Detections</em> — Visual Detects for phishing pages are now
performed instantly. Previously these would have a small 1-2 minute delay.</li>
<li><em>Instant Deletion</em> — Once we add a block-and-delete rule, it is
effective immediately for scan results.</li>
<li><em>Unified Blocking Logic</em> — URLs that are blocked from scanning are
maintained by us in a unified way along with delete rules. These blocking
rules also contain more context about why a certain URL or hostname might be
blocked.</li>
</ul>
<h4 id="next-steps">Next steps</h4>
<p>We are launching inline matching today, but for us this is just the first step
of many. Over the coming weeks and months we will examine how our customers are
adopting the new features and what they might be missing. We will also expose
these new capabilities gradually via our urlscan Pro UI. Make sure to keep an
eye on our changelog!</p>
<h3 id="api-changes">API Changes</h3>
<p>On top of the new features launching with this release, we also made small
improvements to the platform in various places. This is a summary of changes
to API behaviour within this release:</p>
<ul>
<li><strong>Result API</strong>: If a scan has finished but has since been deleted it will now
return a <strong>HTTP/410</strong> error code. If you receive this code from the Result
API, you can stop trying to retrieve the result.</li>
<li><strong>Result API</strong>: Introduction of the following new fields to achieve more uniformity with the Search API:
<ul>
<li><code class="language-plaintext highlighter-rouge">labels</code>: System Labels (see above) - Only in urlscan Pro</li>
<li><code class="language-plaintext highlighter-rouge">usertags</code>: User Tags (see above) - Only in urlscan Pro</li>
<li><code class="language-plaintext highlighter-rouge">metatags</code>: Meta hits for this item - Only in urlscan Pro (<strong>Attention</strong>: This field is called <code class="language-plaintext highlighter-rouge">meta</code> in the Search API)</li>
<li><code class="language-plaintext highlighter-rouge">page.apexDomain</code>: The registered second-level domain of the page hostname</li>
<li><code class="language-plaintext highlighter-rouge">page.mimeType</code>: Page MIME type</li>
<li><code class="language-plaintext highlighter-rouge">page.redirected</code>: Whether the page was redirected</li>
<li><code class="language-plaintext highlighter-rouge">page.status</code>: HTTP response code for primary page</li>
<li><code class="language-plaintext highlighter-rouge">page.title</code>: Title of the website</li>
<li><code class="language-plaintext highlighter-rouge">page.tlsAgeDays</code>: Age of the TLS certificate at the time of scanning</li>
<li><code class="language-plaintext highlighter-rouge">page.tlsIssuer</code>: TLS issuer name for the TLS cert of the page</li>
<li><code class="language-plaintext highlighter-rouge">page.tlsValidDays</code>: Validity period of the TLS certificate in days</li>
<li><code class="language-plaintext highlighter-rouge">page.tlsValidFrom</code>: ISO 8601 timestamp of valid-from date for page TLS certificate</li>
<li><code class="language-plaintext highlighter-rouge">page.umbrellaRank</code>: Cisco Umbrella rank of the page hostname</li>
<li><code class="language-plaintext highlighter-rouge">task.apexDomain</code>: The registered second-level domain of the task hostname</li>
</ul>
</li>
<li><strong>Search API</strong>: Introduction of the following new fields:
<ul>
<li><code class="language-plaintext highlighter-rouge">labels</code>: System Labels (see above) - Only in urlscan Pro</li>
<li><code class="language-plaintext highlighter-rouge">usertags</code>: User Tags (see above) - Only in urlscan Pro</li>
<li><code class="language-plaintext highlighter-rouge">meta</code>: Meta hits for this item - Only in urlscan Pro (<strong>Attention</strong>: This field is called <code class="language-plaintext highlighter-rouge">metatags</code> in the Result API)</li>
</ul>
</li>
<li><strong>Search API</strong>: It will now respond with <strong>HTTP/503</strong> instead of <strong>HTTP/400</strong>
if our search cluster is over capacity. You should wait a few seconds before
attempting to run your search again.</li>
</ul>
<h3 id="availability">Availability</h3>
<p>Inline Matching, User-Defined Tagging and System Labels are available starting
today and is included for all customers on our <em>Professional</em>, <em>Enterprise</em> and
<em>Ultimate</em> plans.</p>
<p>If you want to learn about urlscan Pro platform and how it might be valuable
for your organisation feel free to reach out to us! We offer free trials with
no strings attached. We would be happy to give you a passionate demo of what
our platform can do for you. Reach out to us at
<a href="mailto:sales@urlscan.io">sales@urlscan.io</a>.</p>]]></content><author><name>Johannes Gilger</name></author><category term="changelog" /><category term="product" /><summary type="html"><![CDATA[As we welcome the year 2024, we wanted to update you on what we have been working on in the second half of 2023 and announce the new features that are launching today. These changes will have a profound impact for our customer workflows and our own detection and classification abilities. Saved Searches — A success story When we launched Saved Searches in 2022 for our scans and hostnames feeds, we did not envision how popular this feature would turn out to be. Initially, Saved Searches were meant as a convenient way to bookmark a search term within the urlscan Pro platform. The Subscriptions feature allowed customers to receive notifications for any new items that matched their Saved Searches. Over the past year, the value of Saved Searches to customers has become abundantly clear. Right now we manage more than 3000 Saved Searches and almost 1000 Subscriptions that have been created by our customers. Our subscription notification system sends out over 5000 emails a day. Saved Searches and Subscriptions became even more important when we launched our Newly Observed Domains & Hostnames Feed in late 2022 and urlscan Observe earlier this year. Since then, many of our customers have set up Saved Searches to look for domains impersonating their brand or targeting their workforce. Our feed captures 2.5 million new domains and hostnames every day, so having an expressive search ability to find and alert on interesting hits is crucial. Today we are launching major improvements for Saved Searches, Subscriptions and collaboration within the urlscan Pro platform.]]></summary></entry><entry><title type="html">Announcing urlscan Observe</title><link href="https://urlscan.io/blog/2023/07/18/launching-urlscan-observe/" rel="alternate" type="text/html" title="Announcing urlscan Observe" /><published>2023-07-18T00:00:00+02:00</published><updated>2023-07-18T00:00:00+02:00</updated><id>https://urlscan.io/blog/2023/07/18/launching-urlscan-observe</id><content type="html" xml:base="https://urlscan.io/blog/2023/07/18/launching-urlscan-observe/"><![CDATA[<p>urlscan.io has always been a powerful tool for scanning and investigating
suspicious websites. Our platform is used by hundreds of customers and tens of
thousands of community users to scan suspicious URLs. Up until now, the
majority of these scans were initiated by customers.</p>
<p>Today we are announcing the general availability of <strong>urlscan Observe</strong>, our
new and integrated hands-off monitoring system on the urlscan Pro platform.
urlscan Observe ties together our extensive data collection with our
notification and scanning features to drive fast and automated monitoring of
suspected malicious infrastructure.</p>
<p><a href="/blog/assets/images/urlscan-observe1.png">
<img src="/blog/assets/images/urlscan-observe1.png" title="urlscan Observe" />
</a></p>
<!--more-->
<h3 id="urlscan-observe-the-idea">urlscan Observe: The idea</h3>
<p>urlscan Observe aims to fill two gaps in existing automation workflows:</p>
<ul>
<li>Automatically <em>discovering</em> interesting things such as domains, hostnames, IPs, or URLs.</li>
<li>Automatically <em>monitoring</em> these things for activity and changes.</li>
</ul>
<p>Using the example of domains used for phishing and brand impersonation gives a
good overview of the challenges involved. Proactively looking out for
suspicious domains is something that a lot of our customers are already doing,
and there are a variety of commercial and Open Source tools available for
surfacing these. The easiest way to get started would be to start monitoring
free data sources like <a href="https://certstream.calidog.io/">certstream</a> for TLS
hostnames of interest. While finding suspicious domains might be relatively
easy, the hard part is what happens next: Monitoring these domains to see if
they ever go <em>live</em>.</p>
<p>We have built urlscan Observe to actively <em>monitor</em> observables, scan them
using our web scanning engine, resolve DNS records, capture any observations,
alert you about major changes and show the whole timeline in a dedicated UI.</p>
<h3 id="monitoring-example-suspicious-domains">Monitoring Example: Suspicious domains</h3>
<p>To understand the different stages in the lifecycle of a piece of
infrastructure, let’s have a look at a fictional albeit common timeline of a
suspicious domain and what we can observe about it:</p>
<ul>
<li><strong>1:12am</strong> We observe the domain for the first time in a DNS zonefile. It does not have any DNS A/AAAA records yet.</li>
<li><strong>2:40pm</strong> The domain starts to resolve to an IPv4 address.</li>
<li><strong>2:45pm</strong> We observe the first TLS certificate for this domain in a Certificate Transparency (CT) log.</li>
<li><strong>2:55pm</strong> The domain starts responding to HTTP requests. It only carries an empty landing page.</li>
<li><strong>3:12pm</strong> The domain starts serving a directory index listing on HTTP.</li>
<li><strong>3:36pm</strong> The domain starts serving a phishing site via HTTP.</li>
<li><strong>6:15pm</strong> The domain has deactivated by the hosting company and is now serving an empty placeholder page.</li>
<li><strong>8:50pm</strong> The domain stops resolving via DNS.</li>
</ul>
<p>This timeline of events tells the story of when and how the malicious domain
was first set up, how quickly it went live and how long it took until it was
taken down again. At every step of this lifecycle one has to monitor the domain
(DNS, HTTP, TLS certs) and compare observations with previous time intervals to
figure out if anything about the domain has recently changed. This is exactly
what urlscan Observe will automatically do for you going forward.</p>
<h3 id="urlscan-observe-workflow">urlscan Observe workflow</h3>
<p>urlscan Observe monitors <em>Observables</em> such as hostnames, domains, IPs and URLs
as part of <em>Incidents</em> within urlscan Pro. There are two ways to create these
incidents:</p>
<ul>
<li>You can manually create an incident by supplying your own hostname, domain, IP, or URL.</li>
<li>You can set up <em>Saved Searches</em> and a <em>Subscription</em> within urlscan Pro to automatically create incidents for new observables.</li>
</ul>
<div class="row bottom10">
<div class="col col-md-6">
<a href="/blog/assets/images/urlscan-observe3.png">
<img src="/blog/assets/images/urlscan-observe3.png" title="urlscan Observe" />
</a>
</div>
<div class="col col-md-6">
<a href="/blog/assets/images/urlscan-observe2.png">
<img src="/blog/assets/images/urlscan-observe2.png" title="urlscan Observe" />
</a>
</div>
</div>
<p>Using Saved Searches is a really powerful tool because you can write a query
that matches interesting observables in our <a href="/blog/2022/11/22/newly-observed-domains-and-hostnames/"><strong>Real-Time Newly Observed
Hostnames & Domains
Feed</strong></a>. This feed
captures hundreds of thousands of new domains and millions of new unique
hostnames every day. Using our Search API you can write a query that matches
hostnames of interest, either by strings within the hostname or by certain
infrastructure attributes such as NS, MX or other DNS records.</p>
<p><a href="/blog/assets/images/urlscan-observe5.png">
<img src="/blog/assets/images/urlscan-observe5.png" title="urlscan Observe" />
</a></p>
<p>You can work with incidents from within the urlscan Pro UI, but you can also
set up <em>Alerting Channels</em> to be notified whenever there are new incidents and
changes to existing incidents. As part of urlscan Observe we have overhauled
our notification system and can now send out notifications via E-Mail and
Webhooks.</p>
<h3 id="availability">Availability</h3>
<p>urlscan Observe is available starting today and is included for all customers
on our <em>Professional</em>, <em>Enterprise</em> and <em>Ultimate</em> plans.</p>
<p>If you want to learn about urlscan Pro platform and how it might be valuable
for your organisation feel free to reach out to us! We offer free trials with
no strings attached. We would be happy to give you a passionate demo of what
our platform can do for you. Reach out to us at
<a href="mailto:sales@urlscan.io">sales@urlscan.io</a>.</p>]]></content><author><name>Johannes Gilger</name></author><category term="changelog" /><category term="product" /><summary type="html"><![CDATA[urlscan.io has always been a powerful tool for scanning and investigating suspicious websites. Our platform is used by hundreds of customers and tens of thousands of community users to scan suspicious URLs. Up until now, the majority of these scans were initiated by customers. Today we are announcing the general availability of urlscan Observe, our new and integrated hands-off monitoring system on the urlscan Pro platform. urlscan Observe ties together our extensive data collection with our notification and scanning features to drive fast and automated monitoring of suspected malicious infrastructure.]]></summary></entry><entry><title type="html">2022 year in review and new products launching in 2023</title><link href="https://urlscan.io/blog/2023/01/12/2022-year-in-review-and-new-products-launching-in-2023/" rel="alternate" type="text/html" title="2022 year in review and new products launching in 2023" /><published>2023-01-12T00:00:00+01:00</published><updated>2023-01-12T00:00:00+01:00</updated><id>https://urlscan.io/blog/2023/01/12/2022-year-in-review-and-new-products-launching-in-2023</id><content type="html" xml:base="https://urlscan.io/blog/2023/01/12/2022-year-in-review-and-new-products-launching-in-2023/"><![CDATA[<p>If you’re not sick of hearing it yet: Here’s to a happy new year from all of us
at urlscan.io!</p>
<p>We wanted to take the opportunity to revisit major changes that launched in
2022 and to give you a glimpse of our 2023 roadmap at the same time. Some of
the things we have worked on in 2022 represent the foundation for new products
due to launch over the next quarters.</p>
<!--more-->
<h3 id="may-2022-visual-search">May 2022: Visual Search</h3>
<p>In May of 2022 we announced the cutting-edge <a href="/blog/2022/05/02/visual-search/"><strong>Visual
Search</strong></a> feature within urlscan Pro. By
itself, this feature allows customers to hunt for scans of websites using only
the visual appearance of their screenshot. urlscan itself uses the visual
similarity of a website to improve our brand and phishing detection. The fact
that we are able to offer this feature at our scale and with a meaningful
accuracy means that we and our customers are now able to detect and attribute
phishing pages using <em>Visual Similarity</em> instead of only relying on threshold
or rule-based approaches which work on text or HTTP content of websites.</p>
<h3 id="july-2022-saved-searches--subscriptions">July 2022: Saved Searches & Subscriptions</h3>
<p>In July we launched a frequently-requested feature in the form
of <a href="/blog/2022/07/11/urlscan-pro-product-updates-for-q2-2022/"><strong>Saved Searches & Email
Subscriptions</strong></a>.
This feature lets customers <em>bookmark</em> interesting hunt searches from within the
urlscan Pro UI. Furthermore, customers are now able to <em>subscribe</em> to email
alerts whenever there are new hits for their searches.</p>
<h3 id="november-2022-newly-observed-hostnames--domains">November 2022: Newly Observed Hostnames & Domains</h3>
<p>In November of last year we launched the real-time <a href="/blog/2022/11/22/newly-observed-domains-and-hostnames/"><strong>Newly Observed Hostnames
& Domains</strong></a> datasource
in urlscan Pro. As we said back then, this new feed of data means a great deal
for both urlscan and our customers as it will enable multiple use-cases going
forward. Previously we had to rely on someone submitting a URL to urlscan.io in
order for us to know about the URL and its hostname. Now, we ingest millions of
hostnames and domains every day from a variety of sources, and we can observe
hostnames long before they are starting to host website content.</p>
<p>Access to the data is available through urlscan Pro, and our customers are
already using it to identify potentially malicious domains and hostnames (e.g.
typosquatting) but also to look at their own infrastructure, for example via
subdomain enumeration.</p>
<h2 id="launching-in-2023-urlscan-observe">Launching in 2023: urlscan <em>Observe</em></h2>
<p>For the next year, the big theme for us will be combining the building blocks
we already have into a more powerful and more convenient hands-off pipeline. A
big recurring ask from our customers is a set-and-forget system that lets them
enter keywords and search expressions and hand off the responsibility for
monitoring any hits to urlscan.</p>
<p>That’s why, starting in Q2 of 2023, we will be offering a new module called
<em>urlscan Observe</em>. urlscan Observe will be our hands-off monitoring system for new
domains, hostnames, and URLs. It will perform the onerous job of watching for
<em>changes</em> to interesting <em>things</em> (like domains and URLs) over <em>time</em>. The end
result for customers is a concise timeline of changes to the <em>thing</em> being
monitored along with frequent alerts.</p>
<p>In a nut-shell, urlscan <em>Observe</em> will support:</p>
<ul>
<li>Customer-supplied keywords and search patterns to match against the newly observed domains & hostnames feed</li>
<li>Sending customers alerts about new hits for these keywords</li>
<li>Automatically monitoring for changes related to these hits</li>
<li>Alerting customers about infrastructure changes to these hits: DNS records, new subdomains, active HTTP servers, website content, etc.</li>
<li>Letting customers manually specify domains and URLs to watch</li>
</ul>
<h3 id="new-subscription-options">New Subscription Options</h3>
<p>For 2023 we have also overhauled <a href="https://urlscan.io/pricing/"><strong>our plans &
pricing</strong></a>. Going forward we have one subscription
aimed at customers that are only looking for an automation capability, and we
have three subscription levels for customers that also want to leverage our
powerful urlscan Pro - Threat Hunting platform. We continue to work with
<a href="https://urlscan.io/partners/">various resellers across the globe</a> to
facilitate fast and painless procurement processes.</p>
<h3 id="talk-to-us">Talk to us!</h3>
<p>If you’re interested to learn about our current and upcoming capabilities and
how these might help you automate and reduce your workload, talk to us. For all
of our plans and features we offer a free 30-day trial with no strings
attached. We’d be happy to give you a passionate demo of what our platform can
do for you. Reach out to us at <a href="mailto:sales@urlscan.io">sales@urlscan.io</a>.</p>]]></content><author><name>Johannes Gilger</name></author><category term="changelog" /><summary type="html"><![CDATA[If you’re not sick of hearing it yet: Here’s to a happy new year from all of us at urlscan.io! We wanted to take the opportunity to revisit major changes that launched in 2022 and to give you a glimpse of our 2023 roadmap at the same time. Some of the things we have worked on in 2022 represent the foundation for new products due to launch over the next quarters.]]></summary></entry><entry><title type="html">urlscan Pro - Newly observed domains &amp; hostnames</title><link href="https://urlscan.io/blog/2022/11/22/newly-observed-domains-and-hostnames/" rel="alternate" type="text/html" title="urlscan Pro - Newly observed domains &amp; hostnames" /><published>2022-11-22T00:00:00+01:00</published><updated>2022-11-22T00:00:00+01:00</updated><id>https://urlscan.io/blog/2022/11/22/newly-observed-domains-and-hostnames</id><content type="html" xml:base="https://urlscan.io/blog/2022/11/22/newly-observed-domains-and-hostnames/"><![CDATA[<p>Today we are officially launching our <strong>real-time feed and search index of newly
observed hostnames and domains</strong> on urlscan Pro. This is a huge step forward
since it will allow customers to proactively look for new domains and hostnames
that might be of interest to them, even if these hostnames were not previously
scanned as a full-blown website through urlscan.io.</p>
<div class="row">
<div class="col col-md-12">
<img src="/blog/assets/images/hostnames-search.png" title="urlscan Pro - Newly observed domains & hostnames" />
</div>
</div>
<!--more-->
<p>The <em>hostnames</em> data source tracks newly
observed domains and hostnames sourced from multiple data sources including
Certificate Transparency, passive DNS monitoring, existing scan data,
zonefiles, etc. This data source lets customers:</p>
<ul>
<li>Search for new domains and hostnames even if they were not scanned on urlscan.io.</li>
<li>Search for new domains and hostnames using keywords.</li>
<li>Search for new domains based on infrastructure (such as A, MX, or NS records).</li>
<li>Enumerate subdomains and hostnames for specific domains.</li>
<li>Enumerate hostnames / domains on a specific IP address.</li>
<li>Perform reconnaissance and lightweight attack surface discovery.</li>
<li>Export results via an API call or as a CSV file via the UI.</li>
<li>Create saved searches and email alerts. (Q1 / 2023)</li>
</ul>
<p>Additionally, the <em>certificates</em> data source contains TLS certificates that have
been collected from various Certificate Transparency (CT) logs. With these
additions we are bringing in more of the discovery, alerting and analysis
workflow into urlscan Pro that customers previously had to combine using
auxiliary tools and platforms.</p>
<h3 id="hostnames--domains-by-the-numbers">Hostnames & Domains by the numbers</h3>
<p>Currently our <em>hostnames</em> data sources contains <strong>1.5 billion unique
hostnames</strong> across <strong>310 million unique registered domains</strong>. More than half of
these domains already have a DNS resolution attached to them.</p>
<p>The <em>certificates</em> datasource contains <strong>more than 5 billion TLS certificates</strong>
issued through Certificate Transparency logs.</p>
<p>Both of these data source operate in <em>real time</em>, with new records taking no
more than a minute before they appear in the search results.</p>
<h3 id="next-steps">Next Steps</h3>
<p>Making these data sources available for searching is just the first step. Over
the next year, these new additions will come together to form our fully
automated hands-off monitoring pipeline. This new pipeline will allow our
customers to set up keywords of interest within urlscan Pro and we will take care of the rest:</p>
<ul>
<li>We will watch for and alert on newly observed hostnames matching these keywords.</li>
<li>We will try to resolve these hostnames via DNS until they have a DNS record.</li>
<li>We will scan for potential websites on these hosts and alert customers when a site goes live.</li>
<li>We will monitor the page content for changes, such as when a page goes from landing page to fully set up website.</li>
</ul>
<p>Furthermore these data sources will be more tightly integrated into the urlscan
Pro platform. We will add the same set of convenience features customers have
been used to, including <em>Saved Searches</em> and <em>E-Mail Subscriptions</em> for new
hits.</p>
<h3 id="pricing--availability">Pricing & Availability</h3>
<p>Access to these data sources and the upcoming alerting pipeline is free of
charge for customers on active subscriptions. For new customers and
subscription renewals customers can choose to add this functionality as a
separate subscription module.</p>]]></content><author><name>Johannes Gilger</name></author><category term="product" /><category term="announcement" /><summary type="html"><![CDATA[Today we are officially launching our real-time feed and search index of newly observed hostnames and domains on urlscan Pro. This is a huge step forward since it will allow customers to proactively look for new domains and hostnames that might be of interest to them, even if these hostnames were not previously scanned as a full-blown website through urlscan.io.]]></summary></entry><entry><title type="html">Internet-Wide IPv4 Scan Data</title><link href="https://urlscan.io/blog/2022/09/28/internet-wide-scans/" rel="alternate" type="text/html" title="Internet-Wide IPv4 Scan Data" /><published>2022-09-28T00:00:00+02:00</published><updated>2022-09-28T00:00:00+02:00</updated><id>https://urlscan.io/blog/2022/09/28/internet-wide-scans</id><content type="html" xml:base="https://urlscan.io/blog/2022/09/28/internet-wide-scans/"><![CDATA[<p>We are now offering raw download access to the following datasets to interested customers:</p>
<ul>
<li>Weekly Internet-wide scans of the whole IPv4 space on ports tcp/80 and tcp/443.</li>
<li>JSON output containing TLS certificates and HTTP responses.</li>
<li>More than 200GB of compressed raw data available per week.</li>
<li>More than 40 million HTTP responses on tcp/80 and more than 35 million on tcp/443.</li>
</ul>
<!--more-->
<p>The HTTP scan results contain the following pieces of data and metadata:</p>
<ul>
<li>Observation timestamp</li>
<li>GeoIP and ASN network annotation for the IP which was scanned</li>
<li>HTTP Status code & string</li>
<li>HTTP Response Headers & content</li>
<li>Response Body Size & SHA256</li>
<li>Response Body (up to 64kB)</li>
<li>TLS Certificate Subject, Issuer, Serial Number and MD5/SHA1/SHA256 fingerprints</li>
</ul>
<p>The certificate scan results contain certificates that responded to HTTPS requests:</p>
<ul>
<li>Raw unparsed TLS certificate data</li>
<li>Parsed certificate data</li>
</ul>
<h3 id="sample-data">Sample Data</h3>
<p>These samples are 10,000 lines from the respective datasets and should give you an idea what’s contained within the data.</p>
<h4 id="http-scans-on-tcp80---http-responses">HTTP Scans on tcp/80 - HTTP Responses</h4>
<ul>
<li><a href="https://urlscan.io/share/internetscan-samples/small-sample-slash80.json" target="_blank">Preview 10 result samples</a></li>
<li><a href="https://urlscan.io/share/internetscan-samples/sample-slash80.json.gz">Download 10k result samples (Gzipped)</a></li>
</ul>
<h4 id="http-scans-on-tcp443---http-responses">HTTP Scans on tcp/443 - HTTP Responses</h4>
<ul>
<li><a href="https://urlscan.io/share/internetscan-samples/small-sample-slash443.json" target="_blank">Preview 10 result samples</a></li>
<li><a href="https://urlscan.io/share/internetscan-samples/sample-slash443.json.gz">Download 10k result samples (Gzipped)</a></li>
</ul>
<h4 id="http-scans-on-tcp443---tls-certificates">HTTP Scans on tcp/443 - TLS Certificates</h4>
<ul>
<li><a href="https://urlscan.io/share/internetscan-samples/small-sample-slash443-certs.json" target="_blank">Preview 10 result samples</a></li>
<li><a href="https://urlscan.io/share/internetscan-samples/sample-slash443-certs.json.gz">Download 10k result samples (Gzipped)</a></li>
</ul>
<h3 id="access-to-data">Access to Data</h3>
<p>If you are interested in access to these data-sets then please reach out to us at support@urlscan.io. We offer free 30-day trial access.</p>]]></content><author><name>Johannes Gilger</name></author><category term="product" /><category term="announcement" /><summary type="html"><![CDATA[We are now offering raw download access to the following datasets to interested customers: Weekly Internet-wide scans of the whole IPv4 space on ports tcp/80 and tcp/443. JSON output containing TLS certificates and HTTP responses. More than 200GB of compressed raw data available per week. More than 40 million HTTP responses on tcp/80 and more than 35 million on tcp/443.]]></summary></entry><entry><title type="html">Scan Visibility Best Practices</title><link href="https://urlscan.io/blog/2022/07/27/scan-visibility-best-practices/" rel="alternate" type="text/html" title="Scan Visibility Best Practices" /><published>2022-07-27T00:00:00+02:00</published><updated>2022-07-27T00:00:00+02:00</updated><id>https://urlscan.io/blog/2022/07/27/scan-visibility-best-practices</id><content type="html" xml:base="https://urlscan.io/blog/2022/07/27/scan-visibility-best-practices/"><![CDATA[<p>This post talks about the different scan visibilities available on urlscan.io,
which visibility you should use for different purposes and how to review your
submission results on urlscan.io to detect and prevent inadvertent information
leaks.</p>
<p><strong>tl;dr</strong>: Understand the different scan visibilities, review your own scans for non-public
information, review your automated submission workflows, enforce a maximum scan
visibility for your account and work with us to clean non-public data from urlscan.io!</p>
<!--more-->
<h3 id="scan-visibilities---introduction">Scan Visibilities - Introduction</h3>
<p>Every time you submit a URL to urlscan.io you can select the <em>visibility</em> for
the scan result. The visibility controls which parties will be able to see the
URL you submitted and retrieve the scan results.</p>
<ul>
<li><strong>Public</strong> means that the scan will be visible on the front page and in the
public search results and info pages. It will be visible to any visitor on
urlscan.io and search engines as well.<br />
You should only use <em>Public</em> scans if there are no concerns that the URLs you are
submitting contain any personal or proprietary information, either in the URL
itself or in the content of the page. This could be because you sourced these
URLs from another public data set, or because you discovered these URLs
yourself via crawling or keyword monitoring.</li>
<li><strong>Unlisted</strong> means that the scan will <em>not</em> be visible on the public page or
search results, but will be visible to customers of the urlscan Pro platform.
We only admit customers to urlscan Pro which are either vetted security
researchers or reputable corporations.<br />
You should use <em>Unlisted</em> scans if you think that there might be personal or
proprietary information within the websites, but you still want to document
the URLs to the audience of urlscan Pro so that they can take action
accordingly (automated takedowns, research, improving their products).</li>
<li><strong>Private</strong> means that the scan will only be visible to yourself and not to
any of our customers or partners. If you are part of a team account and have
the team set as “Active”, then your private scans will also be visible to
other team members on that team account.<br />
You should use <em>Private</em> scans if you don’t want to share the scans you
perform with anyone else. The downside is that unique URLs that you might
submit will not be seen by anyone else, and potential malicious activity
might go unnoticed by the community and downstream security companies on the
urlscan Pro platform.</li>
</ul>
<p>There are different reason for choosing specific visibility levels, and picking
the right one very much depends on source of the data you are analysing.
Customers might have different streams of URLs they want to analyses with
urlscan.io, and each stream might have its unique set of privacy
considerations.</p>
<p>We encourage user to use <em>Public</em> or <em>Unlisted</em> scans whenever possible since
it helps the whole security community keep track and understand threats rather
than siloing that information. But we understand that there are use-cases which
don’t allow anything but the <em>Private</em> visibility level.</p>
<h3 id="reviewing-your-submission-results">Reviewing your submission results</h3>
<p>Whether you use urlscan.io via the UI or the API you should frequently review
your submission results to ensure that you are not submitting inappropriate
URLs at any visibility level. To see a list of your own submissions first make
sure you are logged in to urlscan.io before executing the following searches:</p>
<ul>
<li><a href="https://urlscan.io/search/#user%3Ame%20OR%20team%3Ame">Search: <strong>All Scans</strong> submitted by yourself and your teams</a></li>
<li><a href="https://urlscan.io/search/#(user%3Ame%20OR%20team%3Ame)%20AND%20task.visibility%3Apublic">Search: <strong>Public Scans</strong> submitted by yourself and your teams</a></li>
</ul>
<p>Make sure you understand <em>where</em> submissions are originating from: This might
be your employees or your automated tools such as SOAR platforms! You should
watch out for submissions for the following types of websites:</p>
<ul>
<li>Hosted invoice pages</li>
<li>DocuSign or other document signing requests</li>
<li>Google Drive / Dropbox links</li>
<li>Email unsubscribe links</li>
<li>Password reset or create links</li>
<li>Web service, meeting and conference invite links</li>
<li>URLs including PII (email addresses) or API keys</li>
</ul>
<h3 id="setting-and-enforcing-a-default-visibility">Setting and enforcing a default visibility</h3>
<p>urlscan.io allows you to set a default visibility and even to enforce this as
the maximum visibility for all future scans. Both settings can be found in your <em>Settings</em> window on your user dashboard.</p>
<p>Team account owners can change these settings team-wide and have them be
applied to every active team member. This is done on the <em>Settings</em> page for
the team account.</p>
<div class="row">
<div class="col col-md-6">
<img src="/blog/assets/images/scan-visibility.png" title="urlscan Scan Visiblity" />
<p class="help-block">The scan visiblity settings dialog in your user dashboard</p>
</div>
</div>
<h3 id="what-we-are-doing-to-prevent-information-leaks">What we are doing to prevent information leaks</h3>
<p>We are aware of the fact that non-public information is being scanned on
urlscan.io and are taking a number of steps to mitigate this issue.</p>
<ul>
<li>We have domain and URL pattern blocklists in place which prevent scanning of certain websites.</li>
<li>We have deletion rules in place which delete past and future scans for certain keywords and patterns.</li>
<li>We have recently made the <em>Scan Visibility</em> setting in our user dashboard more visible and easier to understand.</li>
<li>We have reached out to customers who we identified as submitting a significant amount of Public scans.</li>
<li>We allow immediate takedown of single scans via the <em>Report</em> button on each scan page.</li>
<li>We work with customers and third parties to facilitate bulk-delete via our deletion rules.</li>
<li>We are reviewing popular third-party integrations such as SOAR tools to ensure they respect the user intent with regards to visibility.</li>
</ul>
<h3 id="security-researchers">Security Researchers</h3>
<p>If you are a <em>security researcher</em> and have discovered a large number of scans
with non-public information we would ask that you reach out to
<a href="mailto:security@urlscan.io">security@urlscan.io</a> and work with us to get the
offending scans removed and to investigate the source of these scans.</p>]]></content><author><name>Johannes Gilger</name></author><category term="knowledge" /><summary type="html"><![CDATA[This post talks about the different scan visibilities available on urlscan.io, which visibility you should use for different purposes and how to review your submission results on urlscan.io to detect and prevent inadvertent information leaks. tl;dr: Understand the different scan visibilities, review your own scans for non-public information, review your automated submission workflows, enforce a maximum scan visibility for your account and work with us to clean non-public data from urlscan.io!]]></summary></entry><entry><title type="html">urlscan Pro - Product Updates for Q2 / 2022</title><link href="https://urlscan.io/blog/2022/07/11/urlscan-pro-product-updates-for-q2-2022/" rel="alternate" type="text/html" title="urlscan Pro - Product Updates for Q2 / 2022" /><published>2022-07-11T00:00:00+02:00</published><updated>2022-07-11T00:00:00+02:00</updated><id>https://urlscan.io/blog/2022/07/11/urlscan-pro-product-updates-for-q2-2022</id><content type="html" xml:base="https://urlscan.io/blog/2022/07/11/urlscan-pro-product-updates-for-q2-2022/"><![CDATA[<p>Today marks the last day of major features releases we had planned for Q2. This
post will cover the highlights of new functionality in our urlscan Pro
platform.</p>
<h4 id="saved-searches--subscriptions">Saved Searches & Subscriptions</h4>
<p>You can now save a search in urlscan Pro to be able to run it again later. On
top of that, you can also receive an email alert whenever there are new hits
for your saved searches. This allows you to create a number of hunt queries
which might only trigger occasionally and automatically receive notifications
when there are new hits.</p>
<div class="row">
<div class="col col-md-7">
<img src="/blog/assets/2022-07-11-urlscan-pro-product-updates-for-q2-2022/urlscanpro-searches.png" title="Saved Searches" />
</div>
<div class="col col-md-5">
<img src="/blog/assets/2022-07-11-urlscan-pro-product-updates-for-q2-2022/urlscanpro-email-alert.png" title="Saved Searches" />
</div>
</div>
<h4 id="search-ui-improvements">Search UI Improvements</h4>
<p>The Search page in urlscan Pro was significantly improved:</p>
<ul>
<li><em>Filters</em> are a convenient way to add, remove and invert pre-defined common
search filters. We have a list of pre-defined filters that you can work with.</li>
<li>In the Search view, the new <em>Aggregations</em> list shows you aggregate
information from your search results and allows you to further filter your
results by adding another facet to your filters.</li>
<li>The new quick filter dialog also contains completions for the <em>Brand Names</em>
that we track in urlscan Pro.</li>
<li>Whenever you have created an interesting search, you can now save it as a
<em>Saved Search</em> directly from the search UI.</li>
<li>You can also use the new <em>CSV Export</em> feature to retrieve the results as a
CSV file.</li>
</ul>
<h4 id="file-downloads">File Downloads</h4>
<p>In the process of scanning websites, urlscan.io will sometimes encounter file
downloads triggered by the website. If we are able to successfully download the
file, we will store it, hash it and make it available for downloading by our
customers.</p>
<p>To highlight this stream of data, we have created a separate <em>Downloads</em> section
which contains the most recent file downloads and highlights the information we
store for each downloaded file. There is a dedicated Help Section on
Downloads which talks about API use and known limitations of this feature.</p>
<h4 id="live-scanning">Live Scanning</h4>
<p>The following features were added to the the Live Scanning UI:</p>
<ul>
<li>Additional devices available for device emulation (iPhone 12, 13, etc)</li>
<li>Scanners can be selected via a new Select All button</li>
<li>Scanner Details can be shown, such as the current exit IP, AS and VPN provider</li>
<li>Scan Results have been cleaned up to give a better overview</li>
<li>Outgoing Links can now be scanned with a dedicated button</li>
<li>Available Live Scan Quotas are shown within the scanning UI</li>
</ul>
<h3 id="urlscan-pro-trial">urlscan Pro Trial</h3>
<p>If you would like to take <strong>urlscan Pro</strong> for a spin just reach out to
<a href="mailto:sales@urlscan.io">sales@urlscan.io</a>. We offer 30-day free trials with
no strings attached.</p>]]></content><author><name>Johannes Gilger</name></author><category term="changelog" /><category term="product" /><summary type="html"><![CDATA[Today marks the last day of major features releases we had planned for Q2. This post will cover the highlights of new functionality in our urlscan Pro platform. Saved Searches & Subscriptions You can now save a search in urlscan Pro to be able to run it again later. On top of that, you can also receive an email alert whenever there are new hits for your saved searches. This allows you to create a number of hunt queries which might only trigger occasionally and automatically receive notifications when there are new hits. Search UI Improvements The Search page in urlscan Pro was significantly improved: Filters are a convenient way to add, remove and invert pre-defined common search filters. We have a list of pre-defined filters that you can work with. In the Search view, the new Aggregations list shows you aggregate information from your search results and allows you to further filter your results by adding another facet to your filters. The new quick filter dialog also contains completions for the Brand Names that we track in urlscan Pro. Whenever you have created an interesting search, you can now save it as a Saved Search directly from the search UI. You can also use the new CSV Export feature to retrieve the results as a CSV file. File Downloads In the process of scanning websites, urlscan.io will sometimes encounter file downloads triggered by the website. If we are able to successfully download the file, we will store it, hash it and make it available for downloading by our customers. To highlight this stream of data, we have created a separate Downloads section which contains the most recent file downloads and highlights the information we store for each downloaded file. There is a dedicated Help Section on Downloads which talks about API use and known limitations of this feature. Live Scanning The following features were added to the the Live Scanning UI: Additional devices available for device emulation (iPhone 12, 13, etc) Scanners can be selected via a new Select All button Scanner Details can be shown, such as the current exit IP, AS and VPN provider Scan Results have been cleaned up to give a better overview Outgoing Links can now be scanned with a dedicated button Available Live Scan Quotas are shown within the scanning UI urlscan Pro Trial If you would like to take urlscan Pro for a spin just reach out to sales@urlscan.io. We offer 30-day free trials with no strings attached.]]></summary></entry><entry><title type="html">Visual Search and Live Scanning APIs GA</title><link href="https://urlscan.io/blog/2022/05/30/visual-search-livescan-api/" rel="alternate" type="text/html" title="Visual Search and Live Scanning APIs GA" /><published>2022-05-30T00:00:00+02:00</published><updated>2022-05-30T00:00:00+02:00</updated><id>https://urlscan.io/blog/2022/05/30/visual-search-livescan-api</id><content type="html" xml:base="https://urlscan.io/blog/2022/05/30/visual-search-livescan-api/"><![CDATA[<p>As of today, our <strong>Live Scanning</strong> and <strong>Visual Search</strong> features are no longer
considered <em>Beta</em>. The APIs for these features are now stable and can be used
in production use-cases. Customers on our <em>Professional</em> and <em>Enterprise</em>
subscription tiers will find API documentation for these features in the
urlscan Pro platform.</p>
<h4 id="visual-search">Visual Search</h4>
<p>Visual Search allows users to find historical scans with visually similar
screenshots to a scan of interest. This type of feature is also called
Content-Based Image Retrieval. Check out the <a href="/blog/2022/05/02/visual-search/">accompanying
blog-post</a> to learn more.</p>
<h4 id="live-scanning">Live Scanning</h4>
<p>Live Scanning allows you to scan websites quickly, from different locations,
and with different browser options. Scan results are not automatically saved to
urlscan.io, but you can use Store Scan if you want to archive a particular scan
result.</p>
<p>Live Scanning is a very versatile capability that can be used for a number of
common scenarios, including <em>Reconnaisance</em>, <em>Change Monitoring</em> and <em>Remote
File Retrieval</em>.</p>
<h3 id="urlscan-pro-trial">urlscan Pro Trial</h3>
<p>If you would like to take <strong>urlscan Pro</strong> for a spin just reach out to
<a href="mailto:sales@urlscan.io">sales@urlscan.io</a>. We offer 30-day free trials with
no strings attached.</p>]]></content><author><name>Johannes Gilger</name></author><category term="changelog" /><category term="product" /><category term="api" /><summary type="html"><![CDATA[As of today, our Live Scanning and Visual Search features are no longer considered Beta. The APIs for these features are now stable and can be used in production use-cases. Customers on our Professional and Enterprise subscription tiers will find API documentation for these features in the urlscan Pro platform. Visual Search Visual Search allows users to find historical scans with visually similar screenshots to a scan of interest. This type of feature is also called Content-Based Image Retrieval. Check out the accompanying blog-post to learn more. Live Scanning Live Scanning allows you to scan websites quickly, from different locations, and with different browser options. Scan results are not automatically saved to urlscan.io, but you can use Store Scan if you want to archive a particular scan result. Live Scanning is a very versatile capability that can be used for a number of common scenarios, including Reconnaisance, Change Monitoring and Remote File Retrieval. urlscan Pro Trial If you would like to take urlscan Pro for a spin just reach out to sales@urlscan.io. We offer 30-day free trials with no strings attached.]]></summary></entry><entry><title type="html">Visual Search</title><link href="https://urlscan.io/blog/2022/05/02/visual-search/" rel="alternate" type="text/html" title="Visual Search" /><published>2022-05-02T00:00:00+02:00</published><updated>2022-05-02T00:00:00+02:00</updated><id>https://urlscan.io/blog/2022/05/02/visual-search</id><content type="html" xml:base="https://urlscan.io/blog/2022/05/02/visual-search/"><![CDATA[<p>Today we are launching <strong>Visual Search</strong> which is a powerful new search feature
available through our <strong>urlscan Pro - Threat Hunting</strong> platform.</p>
<p><img src="/blog/assets/images/visualsearch.png" alt="urlscan Pro - Visual Search" /></p>
<h3 id="use-cases">Use-Cases</h3>
<p>Visual Search allows users to find historical scans with visually similar
screenshots to a scan of interest. This type of feature is also called
<em>Content-Based Image Retrieval</em>. Instead of querying for historical scans using
a structured textual query (such as search for a hostname or an IP address),
Visual Search uses an existing screenshot image as the query. Visual Search
works similar to popular <em>Reverse Image-Search</em> engines like Google’s <em>Search
by Image</em> and the <em>TinEye Reverse Image Search</em>. Customers will be able to
leverage Visual Search feature to discover previously undetected cases of brand
impersonation or similar phishing pages based on the visual appearance of those
sites.</p>
<h3 id="availability">Availability</h3>
<p>Visual Search is available today through the urlscan Pro portal. The feature is
currently in Beta until its API is finalized over the next few weeks. Further
information about Visual Search is available to customers on the urlscan Pro
platform.</p>
<h3 id="urlscan-pro-trial">urlscan Pro Trial</h3>
<p>If you would like to take <strong>urlscan Pro</strong> for a spin just reach out to
<a href="mailto:sales@urlscan.io">sales@urlscan.io</a>. We offer 30-day free trials with
no strings attached.</p>]]></content><author><name>Johannes Gilger</name></author><category term="changelog" /><category term="product" /><summary type="html"><![CDATA[Today we are launching Visual Search which is a powerful new search feature available through our urlscan Pro - Threat Hunting platform. Use-Cases Visual Search allows users to find historical scans with visually similar screenshots to a scan of interest. This type of feature is also called Content-Based Image Retrieval. Instead of querying for historical scans using a structured textual query (such as search for a hostname or an IP address), Visual Search uses an existing screenshot image as the query. Visual Search works similar to popular Reverse Image-Search engines like Google’s Search by Image and the TinEye Reverse Image Search. Customers will be able to leverage Visual Search feature to discover previously undetected cases of brand impersonation or similar phishing pages based on the visual appearance of those sites. Availability Visual Search is available today through the urlscan Pro portal. The feature is currently in Beta until its API is finalized over the next few weeks. Further information about Visual Search is available to customers on the urlscan Pro platform. urlscan Pro Trial If you would like to take urlscan Pro for a spin just reach out to sales@urlscan.io. We offer 30-day free trials with no strings attached.]]></summary></entry><entry><title type="html">Search: New searchable attributes</title><link href="https://urlscan.io/blog/2022/04/21/search-new-searchable-attributes/" rel="alternate" type="text/html" title="Search: New searchable attributes" /><published>2022-04-21T00:00:00+02:00</published><updated>2022-04-21T00:00:00+02:00</updated><id>https://urlscan.io/blog/2022/04/21/search-new-searchable-attributes</id><content type="html" xml:base="https://urlscan.io/blog/2022/04/21/search-new-searchable-attributes/"><![CDATA[<p>Today we are launching a major overhaul to our search index powering our
urlscan.io and urlscan Pro platforms. This release will offer new functionality
to community and paid users. We have gathered customer feedback and
internal use-cases and came up with a list of additional attributes that would
be helpful to search on. This post outlines the highlights of new available
search attributes. All of the new searchable fields have been
integrated in a <strong>backward compatible</strong> fashion, which means that any search
which previously worked on urlscan.io will continue to work.</p>
<p><strong>The full list of searchable fields is available on the <a href="/docs/search/">Search API
Reference</a> page.</strong></p>
<!--more-->
<h3 id="new-searchable-attributes">New searchable attributes</h3>
<p>Over the past two years we recognized various additional fields that we would
like to be able to search. The <strong>title of the page</strong> was an obvious addition,
and we have also gone ahead and added fields like the <strong>age of the TLS
certificate at the time the page was scanned</strong> or the <strong>Cisco Umbrella rank for
the primary hostname of the page</strong>. We hope that these new fields will allow
hunting for more interesting scans on urlscan.io.</p>
<h3 id="wildcard-search">Wildcard Search</h3>
<p>We have changed the way that certain fields are indexed so that these fields
can more effectively be searched using regular expressions and wildcard
expressions. Especially for fields containing arbitrary information like URLs
or hostname it is often crucial to search using complex expressions, something
which was hard or outright impossible to do before. With the wildcard fields,
users can search for single characters in a field like the page url very
quickly.</p>
<h3 id="content-search">Content Search</h3>
<p>Customers on our <em>Professional</em> and <em>Enterprise</em> plans can now find historical
scans by searching for strings in the <strong>text of the website</strong>. We currently
index the first 20kB of visible text content per site.</p>
<p>Customers can also search for structured information from the page, such as the
names and types of <strong>input fields</strong>, the name of <strong>global JavaScript variables</strong>,
and the <strong>detected technologies</strong> employed by a website.</p>
<p>Outgoing links from the website are now indexed by <strong>domain</strong> and <strong>full URL</strong>.
These fields can be searched to incoming links from other websites.</p>
<h3 id="verdicts--brand-search">Verdicts & Brand Search</h3>
<p>We have always allowed customers on our <em>Professional</em> and <em>Enterprise</em> plans
to search for detections of malicious websites on our platform by means of our
brand detection system. Now we also incorporate <strong>community verdicts</strong> into our
search index and combine them with our verdicts to form a global verdict and
score. These attributes are grouped under the <em>verdicts</em> key.</p>
<h3 id="whats-next">What’s next?</h3>
<p>As next steps we will integrate pivoting via the additional attributes to
urlscan.io and our <strong>urlscan Pro</strong> threat hunting platform. Changes to these
platforms will be announce on this blog and on the urlscan Pro platform.</p>
<p>You can reach out to us with any questions via
<a href="mailto:support@urlscan.io">support@urlscan.io</a>.</p>
<p><strong>Editor’s Note</strong>: An earlier version of this blog-post was erroneously published
early, we apologise for any confusion this might have caused!</p>]]></content><author><name>Johannes Gilger</name></author><category term="changelog" /><category term="product" /><category term="api" /><summary type="html"><![CDATA[Today we are launching a major overhaul to our search index powering our urlscan.io and urlscan Pro platforms. This release will offer new functionality to community and paid users. We have gathered customer feedback and internal use-cases and came up with a list of additional attributes that would be helpful to search on. This post outlines the highlights of new available search attributes. All of the new searchable fields have been integrated in a backward compatible fashion, which means that any search which previously worked on urlscan.io will continue to work. The full list of searchable fields is available on the Search API Reference page.]]></summary></entry></feed>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid Atom 1.0" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use:
http://www.feedvalidator.org/check.cgi?url=https%3A//urlscan.io/blog/feed.xml