English Google SEO office-hours from July 1, 2022
SPEAKER: Hello, hi, and welcome
to today's Google Search Central SEO Office
Hours session. It's nice that you could
join us here today, at least virtually, on YouTube. My name is John Mueller. I'm a search advocate on
the Search Relations team here at Google in Switzerland. And part of what we do is answer
questions around web search and websites– maybe questions that
you submitted, as well. Maybe starting off,
I have one topic that has come up
repeatedly recently, and I thought I would
just try to answer it in the form of a question
while we're at it here. So, first of all, when I check
my page speed insight score on my website, I
see a simple number. Why doesn't this match,
what I see in Search Console and the Core Web Vitals report? Which one of these
numbers is correct? Ooh, yeah. I think maybe, first of all,
to get the obvious answer out of the door, there is no correct
number when it comes to speed, when it comes to understanding
how your website is performing for your users.
In PageSpeed
Insights, by default, I believe we show a single
number that kind of is a score from 0 to 100,
something like that, which is based on a number
of assumptions where we assume that different
things are a little bit faster or slower for users. And, based on that,
we calculate a score. In Search Console, we have the
Core Web Vitals information, which is based on three numbers
for speed, responsiveness, and interactivity. And these numbers are
slightly different, of course, because it's three numbers,
not just one number. But, also, there's a big
difference in the way that these numbers
are determined. Namely, there's a difference
between so-called field data and the lab data. Field data is what
users have actually seen when they go to your website. And this is what we
use in Search Console. That's what we use
for search, as well, whereas lab data is kind
of a theoretical view of your website, like
where our systems have certain assumptions
where they think, well, the average user is
probably like this, using this kind of device, and with
this kind of a connection, perhaps. And, based on those
assumptions, we will estimate what those numbers
might be for an average user.
And, obviously, you can imagine
those estimations will never be 100% correct. And, similarly, the data
that users have seen, that will change
over time, as well, where some users might have
a really fast connection or a fast device
and everything goes really fast on their website– or when they visit
your website– and others might not have that. And, because of
that, this variation can always result in
different numbers. Our recommendation is
generally to use the field data, the data you would
see in Search Console, as a way of understanding
what is kind of the current situation
for our website, and then to use the
lab data, namely, the individual tests that you
can run directly yourself, to optimize your website
and try to improve things.
And when you are pretty
happy with the lab data that you're getting with your
new version of your website, then over time, you can
collect the field data, which happens automatically,
and double-check that users actually
see it as being faster or more responsive, as well. So in short, again, there is
no absolutely correct number when it comes to any
of these metrics. There is no kind of absolutely
correct answer where you'd say, this is what it
should actually be. But, rather, there's different
assumptions and different ways of collecting data, and each
of these are subtly different. So, let's see. First question, otherwise, that
I have on my list here, which– all of these questions
were submitted on our YouTube community page. And we'll try to announce
the next ones there, as well. So you can add more
questions there. You can also, of course, post
in our community help forum, where other experts are
able to jump in and give you some advice, as well. So, first up, we have a few
customer pages using Next.js without a robots.txt
or a sitemap file.
Simplified,
theoretically, Googlebot can reach all of these pages,
but why is only the homepage getting indexed? There are no errors or
warnings in Search Console. Why doesn't Googlebot
find the other pages? So maybe taking a
step back, Next.js is a JavaScript
framework, which means that the whole page is kind
of generated with JavaScript. But kind of as kind of a
general answer, as well, for all of these kinds
of questions– like, why is Google not
indexing everything– it's important to first say
that Googlebot will never index everything across a website. I don't think it
happens to any kind of non-trivial-sized
website, that Google would go off and index
completely everything. It's just, from a
practical point of view, it's not possible
to index everything across the whole web.
So that kind of assumption
that the ideal situation is everything is indexed– I would leave that
aside and say you really want Googlebot to focus
on the important pages. The other thing, though, which
became a little bit clearer when, I think, the person
contacted me on Twitter and gave me a little
bit more information about their website, was that
the way that the website was generating links to the
other pages was in a way that Google was not
able to pick up. So, in particular,
with JavaScript you can take any
element on an HTML page and say, if someone
clicks on this then execute this
piece of JavaScript. And that piece of
JavaScript can be to navigate to a different
page, for example. And Googlebot does not
click on all elements to see what happens
but, rather, we go off and look for normal
HTML links, which is the kind of
traditional, normal way that you would link to
individual pages on a website.
And, with this framework,
it didn't generate these normal HTML links. So we could not recognize
that there's actually more to crawl– more pages to actually look at. And this is something that
you can fix in the way that you implement kind
of your JavaScript site. We have a ton of information
on the Search Developer Documentation site around
JavaScript and SEO, in particular, on the topic of
links, because that comes up every now and then. There are lots of creative
ways to create links, and Googlebot really
needs to find those HTML links to make it work. Additionally, we have a bunch of
videos on our YouTube channel. And if you're watching
this, since nobody is here, you must be on the
YouTube channel. If you're watching this
on the YouTube channel, go out and check out those
JavaScript SEO videos on our channel to kind of get
a bit of a sense of what else you could watch out
for when it comes to JavaScript-based websites.
We are able to
process most kinds of JavaScript-based websites
normally, but some things you still have to watch
out for, like these links. Let's see. Next up, does it affect
my SEO score negatively if my page is linking to an
external insecure website? So on HTTP, not HTTPS. So, first off, we don't have
a notion of an SEO score. So you don't really have to
worry about kind of SEO score. But, regardless, I
kind of understand the question is, like, is
it bad if I link to an HTTP page instead of an HTTPS page. And, from our point of
view, it's perfectly fine.
If these pages are on HTTP, then
that's what you would link to. That's kind of what users
would expect to find. There's nothing against
linking to sites like that. There is no kind of
downside for your website to of avoid linking
to HTTP pages because they're kind of
old or crusty and not as cool as on HTTPS.
I would not worry about that. Next up– with Symantec
and voice search, is it better to
use proper grammar or write how people
actually speak? For example, it's
grammatically correct to write, "more than X years," but people
actually say, "over X years," or write a list beginning
with, "such as X, Y, and Z," but people actually
say, "like X, Y, and Z." Good question. So the simple answer is, you
can write however you want. There's nothing holding you back
from just writing naturally. And, essentially,
our systems try to work with the
natural content that we have found on your pages. So if we can crawl and index
those pages with your content, we'll try to work with that. And there's nothing special that
you really need to do there. The one thing I
would watch out for, with regards to how
you write your content, is just to make sure that you're
writing for your audience. So, for example, if you have
some very technical content but you want to reach people
who are non-technical, then write in the
non-technical language and not in a way that only
is understandable to people who are really deep into that
kind of technical information.
So kind of the– I would guess, the traditional
marketing approach of write for your audience. And our systems usually
are able to deal with that perfectly fine. Next up, a question
about links and disavows. Over the last 15
years, I've disavowed over 11,000 links in total. I never bought a link or
did anything unallowed, like sharing. The links that I disavowed may
have been from hacked sites or from nonsense,
auto-generated content. Since Google now claims
that they have better tools to not factor these types
of hacked or spammy links into their algorithms, should
I just delete my disavow file? Is there any risk or upside or
downside to just deleting it? So this is a good question. It comes up every now and then. And disavowing links
is always kind of one of those tricky
topics, because it feels like Google is
probably not telling you the full information. But, from our point of
view, it's actually– we do work really hard to avoid
taking these kind of links into account. And we do that because we know
that the disavow links tool is somewhat a niche tool,
and SEOs know about it, but the average person who runs
a website has no idea about it.
And all of those links
that you mentioned there are the kind of
links that any website gets over the years. And our systems understand
that these are not things that you're trying to do
to game our algorithms. So, from that point
of view, if you're really sure that there's
nothing around a manual action that you had to resolve
with regards to these links, I would just delete the disavow
file and move on with life and leave all of that aside. One thing I would
personally do is just download it and make a
copy so that you have a record of what you deleted. But, otherwise, if you're
sure these are just the normal, crusty
things from the internet, I would just delete
it and move on. There's much more
to spend your time on when it comes to websites
than just disavowing these random things that happen
to any website on the web. So, let's see. Adding schema markup
with Google Tag Manager– is that good or bad for SEO? Does it affect ranking? So, first of all, you
can add structure data with Google Tag Manager.
That's definitely an option. Google Tag Manager is a
simple piece of JavaScript you add to your
pages, and then it does something on
the server side. And it can modify your pages
slightly using JavaScript. For the most part, we're able
to process this normally. And the structured
data that you generate like this, that can
be counted, just like any other kind of
structured data on your web pages. And, from our point of
view, structured data– at least the types that
we have documented– is primarily used to help
generate rich results, we call them, which
are these fancy search results with a little bit
more information, a little bit more color or detail
around your pages. And if you add your structured
data with the Tag Manager, that's perfectly fine. From a practical
point of view, I prefer to have the structured
data directly on the page or directly on your
server so that you know exactly what is happening. It makes it a little bit
easier to debug things.
It makes it easier
to test things. So trying it out with Tag
Manager, from my point of view, I think, is
perfectly legitimate. It's an easy way
to try things out. But, for the long
run, I would try to make sure that your
structured data is on your site directly,
just to make sure that it's easier to process for
anyone who comes by to process your structured data,
and it's easier for you to track and debug and maintain
over time, as well, so that you don't have to check all of these
different separate sources. Let's see. Simplifying a question a
little bit– which is better, blocking with robots.txt
or using the robots meta tag on the page? How do we best prevent crawling? So this also comes
up from time to time.
We actually did a podcast
episode recently about this, as well. So I would check that out. The podcasts are also
on the YouTube channel, so you can click
around a little bit and you'll probably
find that quickly. In practice, there is a
subtle difference here where, if you're
in SEO and you've worked with search
engines, then probably you understand that already. But for people who are
kind of new to the area, it's sometimes unclear exactly
where all of these lines are. And with robots.txt,
which is the first one that you mentioned in the
question, essentially, you can block crawling. So you can prevent
Googlebot from even looking at your pages. And with the robots
meta tag, when Googlebot looks at your pages
and sees that robots meta tag, you can do things like
blocking indexing.
In practice, both of these
kind of result in your pages not appearing in
the search results, but they're subtly different. So if we can't crawl, then we
don't know what we're missing. And it might be that we
say, well, actually, there's a lot of references
to this page. Maybe it is useful
for something. We just don't know. And then that URL could
appear in the search results without
any of its content because we can't look at it. Whereas with the
robots meta tag, if we can look at the page,
then we can look at the meta tag and see if there's a
noindex there, for example. Then we stop indexing
that page and then we drop it completely
from the search results. So if you're trying to block
crawling, then definitely, robots.txt is the way to go. If you just don't want the
page to appear in the search results, then I would pick
whichever one is easier for you to implement. On some sites, it's
easier to kind of set a checkbox saying that I don't
want this page found in Search, and then it adds a
noindex meta tag.
On others, maybe editing the
robots.txt file is easier. Kind of depends on
what you have there. Let's see. Are there any
negative implications to having duplicate URLs
with different attributes in your XML site maps? For example, one
URL in one sitemap with an hreflang annotation, and
the same URL in another sitemap without that annotation. So maybe first of all,
from our point of view, this is perfectly fine. This happens every now and then. Some people have hreflang
annotations in sitemap files specifically kind
of seperated away, and then they have a normal
sitemap file for everything, as well. And there is some overlap there. From our point of
view, we process these sitemap files
as we can, and we take all of that
information into account. There is no downside
to having the same URL in multiple sitemap files. The only thing I
would watch out for is that you don't have
conflicting information in these sitemap files. So, for example, if with
the hreflang annotations, you're saying, oh, this
page is for Germany and then on the other
sitemap file, you're saying, well, actually this page
is also for France– or in French– then our
systems might be like, well, what is happening here? We don't know what to
do with this kind of mix of annotations.
And then it can happen that
we pick one or the other. Similarly, if you say,
this page has been last changed 20 years ago– which doesn't really
make much sense– but say you say 20 years. And in the other
sitemap file, you say, well, actually, it
was five minutes ago. Then our systems might look at
that and say, well, one of you is wrong. We don't know which one. Maybe we'll follow
one or the other. Maybe we'll just ignore
that last modification date completely. So that's kind of the
thing to watch out for. But otherwise, if it's just
mentioned multiple sitemap files and the information
is either consistent or kind of works together, in that maybe
one has the last modification date, the other has the
hreflang annotations, that's perfectly fine. A little bit more technical
question, I guess– I'm in charge of a
video replay platform, and simplified, our embeds are
sometimes indexed individually.
How can we prevent that? So by embeds, I
looked at the website and basically, these
are iframes that include kind of a
simplified HTML page with a video player
embedded in that. And, from a technical
point of view, if a page has iframe
content, then we see those two HTML pages. And it is possible
that our systems indexed both of
those HTML pages, because they are
separate HTML pages. One is included in
the other, usually, but they could theoretically
kind of stand on their own, as well. And there's one way
to prevent that, which is a fairly new
combination with robots meta tags that you can do, which is
with the indexifembedded robots meta tag together with a
noindex robots meta tag.
And basically, on the
embedded version– so the HTML file with the
video directly in it– you would add the
combination of noindex plus indexifembedded
robots meta tags. And that would mean that, if
we find that page individually, we would see, oh,
there's a noindex. We don't have to index this. But with the indexifembedded,
it essentially tells us that, well,
actually, if we find this page with the video embedded
within general website, then we can index that
video content, which means that the individual HTML
page would not be indexed. But the HTML page
with the embed, with the video information,
that would be indexed normally. So that's kind of the setup
that I would use there. And this is a fairly
new robots meta tag, so it's something that
not everyone really needs.
Because this
combination of iframe content or embedded
content is kind of rare. But, for some sites, it just
makes sense to do it like that. Let me see what else
we can squeeze in here. Let's see. Another question
about HTTPS, maybe. I have a question around
preloading SSL via HSTS. We are running
into an issue where implementing HSTS into the
Google Chrome preload list. And the question kind of goes
on with a lot of details. But what should
we do for search? So maybe just taking a step
back, when you have HTTPS pages and you have an HTTP
version, usually, you would redirect from the
HTTP version to HTTPS. And the HTTPS version would
then be the secure version, because that has all of the
properties of the secure URLs. And the HTTP version,
of course, would be the one that is kind of open,
or a little bit vulnerable. And if you have this
redirect then, theoretically, an attacker could take that
into account and kind of mess with that redirect.
And with HSTS, basically,
you're telling the browser that once they've
seen this redirect, it should always
expect that redirect and it shouldn't even try
the HTTP version of that URL. And, for users, that has the
advantage that nobody even goes to the HTTP version
of that page anymore, which makes it a
little bit more secure.
And the pre-load list for Google
Chrome is basically a kind of a static list that
is included, I believe, in Chrome– probably in all of
the updates– or I don't know if it's
downloaded separately, not completely sure. But, essentially, this is a
list of all of these sites where we have confirmed that
HSTS is set up properly, and that redirect
to the secure page exists there, so
that no user ever needs to go to the HTTP version
of the page, which makes it a little bit more secure. From a practical point
of view, this difference is very minimal. And I would expect that
most sites on the internet just use HTTPS without
kind of worrying about the pre-load list. Setting up HSTS is
always a good practice, but it's something that
you can do on your server.
And as soon as the
user has seen that, then their Chrome version keeps
that in mind automatically anyway. So from kind of a
general point of view, I think using the pre-load
list is definitely a good idea if you can do that. But if there are practical
reasons why that isn't feasible or not possible then, from
my point of view, at least, only looking at the
SEO side of things, I would not worry about that. When it comes to SEO, for
Google, what really matters is essentially the URL that
is picked as the canonical. And, for that, it
doesn't need HSTS. It doesn't need
the pre-load list. That has no effect at all on
how we pick the canonical. But rather, for the canonical,
the part that is important is we see that redirect
from HTTP to HTTPS.
And we can kind of
get a confirmation within your website,
through the sitemap file, the internal
linking, all of that, that the HTTPS version
is really the one that should be used in Search. And if we use the HTTPS
version in Search, then that automatically
gets all of those kind of subtle ranking
bonuses from Search. And the pre-load list and HSTS
is not really necessary there. So that's kind of the part
that I would focus on there. Let's see. Maybe one more question here. I don't really have
a great answer, but I think it's important to
at least mention, as well– What are the possible
steps for investigation if a website owner finds
their website is not ranking for their
brand term anymore, and they checked
all of the things, and it doesn't
seem to be related to any of the usual things? So, from my point of
view, I would primarily focus on the Search Console
or the Search Central Health Community, and post all
of your details there.
Because this is kind
of the place where all of those escalations go and
where the product and the Help forum, they can
take a look at that. And they can give you a
little bit more information. They can also give you
their personal opinion on some of these
topics, which might not match 100% what
Google would say, but maybe they're a little
bit more practical, where, for example– probably not relevant
to this site– but you might post
something and say, well, my site is technically correct
and post all of your details. And one of the product
experts looks at it and says, well, it might be
technically correct, but it's still a
terrible website. You need to get your
act together and write– create some better content. And, from our point of
view, we would focus on the technical correctness.
And what you really need is
someone to give you that, I don't know, personal
feedback, almost. But anyway, in the
Help forums, if you post the details of your
website with everything that you've been
seeing, oftentimes, the product experts
are able to take a look and give you some
advice on, specifically, your website and the
situation that it's in. And if they're not
able to figure out what is happening
there, they also have the ability to escalate
these kind of topics to the community manager
of the Help forums.
And the community manager
can also bring things back to the Google Search team. So if there are things
that are really weird– and every now and then,
something really weird does happen with
regards to Search. It's a complex computer system. Anything can break. But the community managers
and the product experts can bring that back
to the Search team. And they can take a look
to see is there something that we need to fix,
or is there something that we need to
tell the site owner, or is this kind of
just the way that it is– which, sometimes, it is. But that's generally
the direction that I would go for
these kind of questions. The other thing that is kind
of subtly mentioned here is, I think, the site not
ranking for its brand name.
One of the things to watch out
for, especially with regards to brand names, is that it can
happen that you say something is your brand name,
but it's not actually a recognized term from users. For example, you might say– I don't know. You might call your website
bestcomputermouse.com. And, for you, that might be
what you call your business or what you call your website– Best Computer Mouse. But when a user goes
to Google and enters "best computer mouse,"
that doesn't necessarily mean that they want to go
directly to your website. It might be that they're
looking for a computer mouse. And, in cases like
that, there might be a mismatch of what we show
in the search results with what you think you would like to
have shown for the search results for those queries,
if it's something more of a generic term, actually. And these kind of things
also play into search results overall. The product experts see
these all the time, as well. And they can recognize that
and say, well, actually, just because you call your
website bestcomputermouse.com– I hope that site doesn't exist.
But, anyway, just because
your call your website this doesn't necessarily
mean it will always show on top of
the search results when someone enters that query. But that's kind of
something to watch out for. But, in general,
I would definitely go to the Help
forums here, and I would include all
of the information that you know that
might play a role here. So if there was a manual action
involved and you're kind of, I don't know, ashamed of that–
which, it's kind of normal. But all of these
informations, they help the product
experts to better understand your situation and
to give you something actionable that you can do to kind
of take as a next step or to understand the
situation a little bit better. So the more information you can
give them from the beginning, the more likely they'll be able
to help you with your problem. Cool. All right. And, with that, I think we're
pretty much out of time. At least, my meeting room
time here is running out.
So thank you for
watching so far. I'll see if we can add
the next office hours sessions to the
YouTube community page, as well, to make sure that
we stay on top of the things that people are asking. If you liked this, feel free
to subscribe to the channel. We have lots more SEO
content on the channel, as well, including
the JavaScript SEO things that I mentioned.
Yeah. And, otherwise, thank you
for watching, and wishing you a great time. Bye..