English Google SEO office-hours from April 1, 2022
JOHN MUELLER: All right. Welcome, everyone, to today's
"Google Search Central SEO Office Hours Hangout." My name is John Mueller. I'm a Search Advocate here
at Google in Switzerland. And part of what we do are
these office hour sessions where people can jump in
and ask their questions around the websites. We can try to find some answers. I have a bunch of things
submitted on YouTube which we can go through. But it also looks like we have
a ton of people raising hands. So let's get started there. Let's see, Isabel,
you're up first. ISABEL ROMAN: Hello,
good morning, everybody. My question is about the
crawling of our website.
We have different
number from the crawling of the in search console
in our server log. For instance, we have
three times the number of the crawling from
Google in our server. And in search console, we
have one of the three bar. Could be possible, maybe
we have something wrong, or we don't authenticate
being the Google bot, because the numbers
are different. JOHN MUELLER: I think
the numbers would always be different, just because
of the way that, in Search Console, we report
on all crawls that go through the
infrastructure of Googlebot. But that also includes
other types of requests. So, for example, I
think the ad spot also uses the Googlebot
infrastructure, those kind of things. And they have
different user agents. So if you look at
your server logs and only look at the
Googlebot user agent that we use for web
search, those numbers will never match what we
show in Search Console. ISABEL ROMAN: But we
have more in our server log than in Google
Search Console.
JOHN MUELLER: OK, if you have
more in your server logs, then to me that would
sound like either you have multiple sites
combined in the server logs, or you're also
counting requests by other people who use
the same user agent. ISABEL ROMAN: OK. JOHN MUELLER: So
the user agent isn't something that is locked and
only limited to Googlebot. Anyone can say,
like, I am Googlebot, but even if they're
not Googlebot. So there is one
way to double-check that with the IP address.
And I think we have
that in our Help Center, how to kind of double check
that it's really a Googlebot. ISABEL ROMAN: OK. OK. Thank you. JOHN MUELLER: I would
double check that. ISABEL ROMAN: OK. JOHN MUELLER: Cool. Gaurav. GAURAV PATIL: Hey, John,
I have a question in mind. We have a website and from
last three to four months, we have been working
on Google Web Stories. It was going very well, the last
month, skip to March actually. We were having some round of 400
to 500 real time users coming from Google Discover
on our Web Stories, but suddenly we saw a
instant drop in our visitors from Google Discover. And our Search Console is
not even showing any errors. What could be possible
reason for that? JOHN MUELLER: I
don't think there needs to be any specific
reason for that, because especially
with Discover, it's something that
we would consider to be additional
traffic to a website.
And it's something that
can change very quickly. And anecdotally,
I've seen that from– or I've heard that from
sites in these office hours that sometimes they get a
lot of traffic from Discover and then suddenly it goes away. And then it comes back again. So it doesn't necessarily
mean that you're doing anything technically wrong. It might just be that
in Discover, things have slightly changed.
And then suddenly
you get more traffic or suddenly you
get less traffic. We do have a bunch
of policies that apply to content that we
show in Google Discover. And I would double check
those, just to make sure that you're not accidentally
touching on any of those areas. That includes things,
I don't know offhand. But I think something
around the lines of like clickbait-y content, for
example, those kind of things. But I would double check
those guidelines to make sure that you're all in line there. But even if everything is
in line with the guidelines, it can be that suddenly you get
a lot of traffic from Discover and then suddenly you
get a lot less traffic. GAURAV PATIL: No, I didn't mean
that, not suddenly we got it. We were having it constantly
for almost three to four months from when we started working
on Web Stories, 400 to 500 of real-time visitors we had.
We didn't change any part of
our content or what we got on, we didn't try
anything new or that. Everything was as we
had learned before. But we saw an instant drop. JOHN MUELLER: Yeah. GAURAV PATIL: Not even
that it was less and less, instantly, there
were no visitors. JOHN MUELLER: Yeah, that's– anecdotally, that's
what I hear from people, that it goes from you
get a lot of traffic, and then suddenly it
goes to almost nothing. GAURAV PATIL: OK. JOHN MUELLER: Sorry
I don't have any kind of better tip in that regard. I think Discover
is something that is worthwhile trying out and
seeing what you can do there. But it is something,
because of its nature, it's not tied to specific
people searching.
So it's a lot harder
to kind of work on it and have it in a
persistent state. GAURAV PATIL: OK. JOHN MUELLER: Cool. OK, Ritu. RITU NAGARKOTI: Me, so– JOHN MUELLER: Hi. RITU NAGARKOTI: I'm losing you. JOHN MUELLER: I cannot
hear you so well. But maybe it will get better. RITU NAGARKOTI: Well,
now I'm out of it. JOHN MUELLER: A
little bit, yeah. RITU NAGARKOTI: So
I have a question to the people of old domains
who like when they do that, and in old domains,
your new domains.
And this time like
for the backlinks, the backlinks we are
creating for the old domain. They are auditable? Like when we audit,
when we keep, when it will redirect
to new domain, after that it will audit the
old domain backlinks, which we created for old domain,
that will be auditable? JOHN MUELLER: I think they
would show in Search Console. I mean, in Search Console, you
would see a sample of the links to your website.
And if they redirect
through another domain, then I think we would still
show that in Search Console. Sometimes I think we show
that in-between state, that like you have a link
through this other website. RITU NAGARKOTI: OK, in
the option like link. But after that is
it possible then we would redirect to a
new domain, like we can view for the old domain also? JOHN MUELLER: Sure, sure, yeah. I mean, so sometimes
sites change domains. I would not recommend
doing that regularly. I think every domain
move that you make, it brings challenges
and possible problems. But if you need to change your
domain name, that happens now. RITU NAGARKOTI: But then more
efficient, like my website, like some spammy websites
regularly copying our content, like spammy, like which
have a very low domain authority, not a good
website for the blog content.
So we are doing like manually
for the delisting of all the orders from Google SCRP. We are just manually submitting
the request to Google, by the form. After all URLs are
delisted from Google, is there a right way to get rid
from these type of activities? Or we can go for an
alternate option for this, like, or further,
we will not use these type of spammy
activities for on our website. JOHN MUELLER: How do you
mean alternate process? RITU NAGARKOTI:
Like very spammy, like this is my name, lower,
pretty toxic backlinks. We are getting so
many toxic backlinks. Like most of the websites
are copying our content.
Our whole content is
copied on our website. And then we are delisting, for
delisting them from Google SCR. We are submitting the
request to Google, then they are manually
removing those domains from like Google SCRP. So is there an
effective way to get rid from these type of activities? Or we can go for any other
alternate option or any paid or free option, which is
available, you can offer. JOHN MUELLER: So I think
if another site is copying your copyrighted content,
then probably the DMCA form, I think that you're using,
is the right way to go there. RITU NAGARKOTI: Yeah,
actually, yeah, we have taken the
plan also a little. But for our liberty
time, because plan is very expensive for it. So that's why we are
predicting we should delist it while it's manually. So we are thinking if we get
any other option like for this, this deal, being in plans? JOHN MUELLER: I don't think
we have any kind of paid plans or anything like that.
RITU NAGARKOTI: OK. JOHN MUELLER: So
the DMCA process, if that applies to your case,
that should be something that is available for free. You can just use it
directly in Search Console. Or I don't know,
wherever the form is. So that might be one approach. The DMCA process
is a legal process and I can't give
you legal advice. So you might have to
check with a lawyer to see if it's appropriate. The other thing that you
could do if you wanted to is to use the Disavow
backlinks tool, if you're worried about those links. For the most part, if
it's a random spammy site, I would not worry about it. I would not use the
Disavow backlinks tool, because it's just extra work. RITU NAGARKOTI: OK.
There would be no effect
of toxic backgrounds on our website? JOHN MUELLER: No,
I can't imagine. No. RITU NAGARKOTI: So
one more question, like our website is going
through a migration process. So we have followed a checklist
like which we should consider before migration process. And we followed and
we worked on it. And, you know, we are
worried about that. Something like we will get a
traffic loss for the website. Is there any chances
we get a traffic loss, the website will have a traffic
loss, in small amount or like– JOHN MUELLER: It's
always possible. So with migrations, when
you're moving from one domain to another, there's
always a period of fluctuation in between. And usually if you're following
the guides that we have, then it's a minimal fluctuation. But it can still happen. RITU NAGARKOTI: Is
it enough to which checklist we are following,
which are available in Search Engine Lang, so to signal. Is it enough for it like to
get rid of some traffic loss, like minimal amount of
traffic loss, so please. JOHN MUELLER: Yeah. RITU NAGARKOTI: But then I'm
sorry, what we should do.
JOHN MUELLER: We
have a checklist in the search developer
documentation for site moves. And I would double check that. RITU NAGARKOTI: Thank you so
much, sir, for your answer. MICHAEL LEWITTES: John,
John, I'm sorry to interrupt. But can I offer Ritu another
suggestion here about the DMCA? And much like you, I'm
not offering legal advice. But there is another
path, another remedy that you can take, which is you
can find out who the host is. There's a site that I believe
is called Whoishostingthis.com. You can find who the host is. And they generally
have a department that you can reach out to with
an email address such as legal @ whatever the name
is, or abuse @.
They have these departments. And you can reach out to
them and say that someone is infringing on your content. But you need to give them
back-up documentation. And generally they
will not ignore this. There's an area of law called
safe harbor, in which that means that hosts can't
moderate everything that they're hosting. But if they're told enough
times about infringements, and you give them
legitimate documentation, they're almost bound
to take it down. The problem is that sometimes it
becomes a game of whac-a-mole, where people change hosts. But it takes time
and eventually, these people, if they are
infringing on your content, they'll give up and
they'll stop doing it. RITU NAGARKOTI: So thank you
so much for your response. But can you share the
link of your checkbox for this, like visual check? It would be good for me.
JOHN MUELLER: Cool, yeah, that
definitely also makes sense. Let's see, Joseph. JOSEPH TANNOUS:
Yes, hello, John. JOHN MUELLER: Hi. JOSEPH TANNOUS: We have
a content publishing website since 2009. And we experienced a
bad migration in 2020, where we encountered a huge
drop in organic traffic. So the question here is that
we had a lot of broken links. So we used the 301 redirect
to redirect these broken links to the original articles. But what we did
in the robots.txt, we disallowed these links,
so that the crawling budget won't be gone on
cloning these 404 pages. So the main thing here, if
we fixed all these redirects, if we redirected to the same
article with the proper name, can we remove these links
from the robots.txt, and how much time does
it take to actually be considered by Google? JOHN MUELLER: So if the page
is blocked in the robots.txt, we wouldn't be able
to see the redirect.
So if you set up a
redirect, you would need to remove that
block in the robots.txt. With regards to the
time that takes, there is no specific time,
because we don't crawl all pages with the same speed. So some pages we may pick
up within a few hours, and other pages might take
several months to be recrawled. So that's I think
kind of tricky. The other thing I think
worth mentioning here is if this is from a migration
that is two years back now– JOSEPH TANNOUS: Yes. JOHN MUELLER: Then
I don't think you would get much value out of
just making those 404 links kind of show content now. I can't imagine
that that would be the reason why your
website would be getting significantly less traffic. Mostly because it's– I mean, unless these pages
are the most important pages of your website. But then you would
have noticed that. But if these are just generic
pages on a bigger website, then I can't imagine that the
overall traffic to a website would drop because they
were no longer available.
JOSEPH TANNOUS: OK. Another question about
the optimal content length on a page. You know, we have encountered
many blog posts that talk about, let's say we need to
have around 100 or 1,000 words per page. So what's the optimal
content length? JOHN MUELLER: I don't
think there is one. Some pages are very short. Some pages are very long. It kind of depends on
the amount of information that you want to give users. JOSEPH TANNOUS: That is
the storm propagating now, the thin content. Is it applicable
by Google or not? JOHN MUELLER: Usually
that applies more to the overall website. So it's not so
much that one page doesn't have enough content.
It's more that the
website overall is very light on
actual information. JOSEPH TANNOUS: Oh, OK. JOHN MUELLER: So I
wouldn't use the word count as a way to recognize that. I think sometimes the word
count is useful for you to look at a larger
website overall, and to try to find
areas where maybe you could be doing better. But I wouldn't use it
as a metric to guide kind of like the specific things
that you do on the website. JOSEPH TANNOUS: OK. OK, thank you so much. JOHN MUELLER: Sure. Rob. ROB HEINEN: Hi, John. Thanks for the time. My question pertains to– it's a crawling
question, pertaining to discovered not indexed. We run a two-sided
marketplace since 2013 that's fairly well established.
We have about 70,000 pages,
and about 70% of those are generally in the index. And then there's kind of
this budget that crawls the new pages that get created. And those, we see movement on
that, so that old pages go out, new pages come on. At the same time, we're
also writing blog entries from our editorial team. And to kind of get those
to the top of the queue, we always use this
request indexing on those, so they'll go quicker. We add them to the
site map as well. But we find that we
write them and then we want them to get on to the
Google as quick as possible. As we've kind of been
growing over the last year, and we have more
content on our site, we've seen that that
sometimes doesn't work as well for the
new blog entries, and they also sit in this
discovered not indexed queue for a longer time.
Is there anything we can do to– like internal
links or something? Is it content-based, or do we
just have to live with the fact that some of our blogs might
not make it into the index? JOHN MUELLER: Yeah,
I think overall it's kind of normal that we don't
index everything on a website. So that can happen to
kind of the entrance you have on the site, and also
the blog posts on the site. It's not tied to a
specific kind of content. I think using the inspect URL
tool to submit them to indexing is fine. It definitely doesn't
cause any problems. But I would also
try to find ways to make those pages kind
of as clear as possible, that you care about that. So essentially, with internal
linking is a good way to do that, to really make
sure that, from your home page, you're saying like here are
the five new blog posts. And you link to them directly,
so that it's easy for Googlebot when we crawl and index
your home page to see, oh, there's something new and
it's linked from the home page.
So maybe it's important. Maybe we should go
off and look at that. ROB HEINEN: OK, so if it's
linked from the home page, it's more likely that Google
sees it as important than if we just add it and it kind of
gets added on a sub-blog page, is that it? JOHN MUELLER: Yeah. ROB HEINEN: OK. JOHN MUELLER: Definitely. ROB HEINEN: OK,
that helps, yeah. JOHN MUELLER: Just making
it as obvious as possible for us to figure out. ROB HEINEN: OK.
To see that– JOHN MUELLER: There's also– usually if you have a
blog section on your site you also have RSS feeds. And if you have that
set up, I would also submit those to Google
in Search Console, just because RSS feeds tend
to focus more on the newer content, and that kind
of helps us to pick those up a little bit faster. So we use them similar
to sitemap files, but sometimes RSS feeds are a
bit easier for us to pick up.
ROB HEINEN: OK,
that's a good hint. I can implement that on
Google Search Console set. JOHN MUELLER: Cool. ROB HEINEN: Cool. Thanks for your time. JOHN MUELLER: Sure. Rohan. ROHAN CHAUBEY: All
right, hey, John, thank you so much for
taking on my question here. So I'm one of the
moderators at Reddit. And ever since GPT-3-based
AI writing tools started to get advertised,
our community's having a debate over
whether or not to use them. Although our stance is mostly
against it, but we are trying– or maybe I should
say we are struggling to see what is Google's
official position on this scene. Someone had posted
on Reddit, if they should be using these AI
content writing tools, to which you responded, no. But you had no further
explanation to it. And then you had also posted
some cryptic tweets about it.
So I'm just trying to
understand more that how does Google react
to websites hosting AI routine content, and what is
your suggestion for all of us? JOHN MUELLER: Now, so for
us, these would essentially still fall into the category
of automatically generated content, which is
something that we've had in the Webmaster Guidelines
since almost the beginning, I think. And people have been
automatically generating content in lots
of different ways. And for us, if you're using
machine learning tools to generate your
content, it's essentially the same as if you're just,
I don't know, shuffling words around or looking up
synonyms, or doing kind of the translation
tricks that people used to do, those kind of things. My suspicion is that maybe
the quality of content is a little bit better than
the really old school tools. But for us, it's still
automatically generated content.
And that means, for us, it's
still against the webmaster guidelines. So we would consider
that to be spam. So that's– ROHAN CHAUBEY: Just to
call you off on that, are you saying that Google
is able to understand the difference between
human and AI content? JOHN MUELLER: I
can't claim that. But, I mean, so for us,
if we see that something is automatically generated,
then the web spam team can definitely take
action on that. And I don't know how the
future will evolve there. But I imagine like with any
of these other technologies, there will be a little bit
of a cat and mouse game, where sometimes people
will do something and they get away with it.
And then the web spam team
catches up and kind of solves that issue on a broader scale. But from our
recommendation, we still see it as automatically
generated content. I think, I don't
know, over time, maybe this is something
that will evolve, in that it will become
more of a tool for people, kind of like you would use
machine translation as a basis for creating a translated
version of a website. But you still essentially
work through it manually. And maybe over
time these AI tools will evolve in that
direction, that you use them to be more
efficient in your writing. Or to make sure that you're
writing in a proper way, kind of like the spelling
and the grammar checking tools, which are also
based on machine learning. But I don't know what
the future brings there. ROHAN CHAUBEY: So you're saying
currently it's not there yet. JOHN MUELLER: I mean currently
it's all against the webmaster guidelines. So from our point
of view, if we were to run across something
like that, if the web spam team were to see it, they
would see it as spam.
ROHAN CHAUBEY: OK, got it. Thank you so much. JOHN MUELLER: Sure. All right, Viola. VIOLA BONGINI:
Yes, good morning. I would like to ask you, does
using an HTML tag such as span or class into an H1, can
affect my website from a SEO point of view? JOHN MUELLER: I don't think so. I mean, my understanding
is that we would still see that as a heading, and as
long as it's a valid heading, we would be able to use that. I don't know from a technically
valid HTML point of view if that's still the correct
way to do a heading. But from our point of
view, we would probably just see that as a heading. VIOLA BONGINI: OK,
so it's not a problem if I want to make
animation for my H1, I use a span class, because we
think that our website go down in the rankings for this
reason and others, reality.
So I don't know if
maybe it's better to have a clean H1 without
anything inside, or– JOHN MUELLER: I think– I can't imagine that
a website would drop in visibility because of that. VIOLA BONGINI: OK. And my second question
is, does our low rating mobile results on Google
page speed like LCP, FID, might have affected our website
rank after the introduction of the new algorithm
last summer, because we were like the
fourth around my city, OK? If I check a web agency
or a keyword that way.
So after the introduction
of this algorithm and going on Google Search
Console, we find out that these parameters
like LCP, FID, for mobile, has a
bad rating like 48, not for desktop, there it is 90. So it's OK. So it could be the
problem of power. JOHN MUELLER: Could be. It's hard to say
just based on that. So I think that there is maybe
two things to watch out for. The number that you gave me
sounds like the PageSpeed insight score that is generated,
I think, on desktop and mobile, kind of that number
from 0 to 100, I think. VIOLA BONGINI: Yeah. JOHN MUELLER: Yeah. We don't use that in
Search for the rankings. We use the Core Web
Vitals, where there is LCP, FID, and COS, I think. And those are based on– or the metrics that
we use are based on what users actually see. So if you go into
Search Console, there's the Core
Web Vitals report. And that should show
you those numbers, if it's within good or bad,
kind of in those ranges. VIOLA BONGINI: But if I go
on Google Search Console on Core Web Vitals,
OK, it said to me that the data served
from the last 90 days are not enough for
this type of device.
JOHN MUELLER: Yeah,
OK, then in that case, it's not due to that. So we have to have enough data
that we can collect from users to be able to understand the
status of the Core Web Vitals for the site, but especially
for smaller sites, we might not have enough data. VIOLA BONGINI: OK. JOHN MUELLER: And then we would
not use that as a factor there. So even if the PageSpeed
Insight score is bad, that isn't something
that we would use.
We would only use
it if we really have data from users that
tell us that it's actually slow in practice. So I would use
that score as a way to kind of prepare for
when you have more traffic, and have those numbers. But it wouldn't be the reason
why your website will drop. VIOLA BONGINI: OK, so we have
to find a reason in another way.
OK, I understand. OK, thank you very much. JOHN MUELLER: Cool. Sure. Let me go through some of
the submitted questions, and I have more time
afterwards to go through like all of the
people with the raised hands as well here. So stick around. Let's see, the first
question I have here is, does it make a difference
if a navigational link is visible on desktop
but hidden on mobile? Can eventually nonvisible
navigation links be considered less
relevant as visible ones? So I suspect by
hidden on mobile it means it's not removed
on mobile, but rather just not visible directly. And as long as the link is
within the HTML of the page, for us that's fine.
For crawling, we can still
pass signals to those links. All of that is fine. If on the other hand,
the link is not even in the HTML on mobile,
and we crawl the site with mobile-first
indexing, then we would not see that link at all. So that's, I think, the
important distinction to think about there. You can check this
in Search Console with the Inspect URL tool,
to try to fetch that page and to see if, in the HTML, the
link is actually still there.
If the link is there,
then that should be fine. We have some shortcuts
on our website. And so these are some German
shortcuts like IDR or UA, or ZBSP. So awkward to read these
German shortcuts in English. Is this readable for
Google, or should we avoid shortcuts on our website? So from a practical
point of view, if this is something that
you're trying to rank for, then I would make it as
clear as possible what you're trying to rank for. But these are essentially
just shortcuts that you would use in
writing on a normal page. And the content that
you want to rank for is essentially
not the shortcuts, but kind of like what's
around the shortcuts. And from that point of view,
I would not worry about this. If this is normal language
usage, then it is what it is. It is what users expect. They can read it. They can understand it. And that's essentially fine
from our point of view. We're seeing
competitors showing up in search results
with aggregate rating and high price, low
price rich results used on e-commerce product
category pages. Based on the documentation,
it looks like this is not an approved use case.
So I wanted to check and
verify if I'm understanding the documentation correctly,
and if this is something that we should be
working on as well. Yeah, unfortunately not everyone
implements things the way that we have it
documented, whether that's on purpose that they're
doing things in ways that we don't want them to do,
or whether it's accidentally, just implemented incorrectly. It's sometimes hard to tell. But if you're looking
at the documentation, and it's clear to you how
you should be doing it, then I would try to do it
that way, and not the way that we say not to use it.
So especially for
category pages, I think our guidance is
still that you should not be using this kind
of structured data there, because it's
not the same product. It's a bunch of
different products. So I would try to avoid that. We recently started a chat
tool as a help service. The button is a small bubble
in the bottom left corner on mobile. It covers a bit of
content because the bubble is an overlay. Will we have a ranking problem
because of a usability problem? Should we have an
additional close button? The bubble is
visible on all pages, also when users change pages. So I don't know about
the usability side. I can't really give
you advice there. But with regards to
kind of searches, general ranking
guidelines there, I think there are two aspects
that could come into play. And it's something
where you probably have to make a judgment
call on your side.
On the one hand, there is
the intrusive interstitial guidelines that we
have, where, if you have an interstitial
on your pages that is intrusive for
users, then that's something that we would
recommend avoiding. That's a part of the page
experience ranking factor. And it's something
where we don't have like a fixed number
of pixels or anything like that, which we would
say would be intrusive. And my guess is if you have a
chat bubble in the corner, then that would not be considered
intrusive by your users. The other aspect is around
the Core Web Vitals, so in particular, I
think the LCP and maybe the content layout shift, so the
time it takes to load the page. and if the content shifts around
while that page is loading.
Depending on how you have
implemented this kind of chat bubble in the corner,
that might be something that would be playing a role. And that's something which
probably you can test, where you turn that on and off. And you try it out
in your browser to see what is the impact there. The effect that you
see in your browser is kind of a different
effect than users might see, but it gives you
a bit of guidance. So in particular,
for Core Web Vitals, we use the metrics
that users see, which we call field data, in the
Chrome User Experience Report. And if you test it
yourself, we would consider that to
be lab data, which might be slightly different than
what an average user would see.
Primarily this is due to maybe
the different connections that users have or different
devices, the capabilities, how fast they are, how
much RAM they have, those kind of differences. But usually, if you
turn it on and off, you will quickly see is there
an actual difference or not. And if there is a
difference, then you kind of have to make a judgment call. Is this something I want
to worry about or not? Or maybe it's something
that you just track for a while to double check. So those are kind of
the two primary things that come into play. Another one that we've
seen in the past, which is kind of a weird edge case,
specifically around these chat things that sometimes run
with JavaScript in the corner, is that oftentimes they will
use the page title as a way to signal that actually there's
a chat message waiting for you. It'll add something like
a 1 to the page title if some chat operator
is waiting for you to chat with them, essentially.
And if you use JavaScript
to change the page title, then that is something
that we could pick up when we render the page. And we have seen cases where
suddenly all pages of a website have a 1 attached to
the title, which comes from one of these chat tools. So that's something to
watch out for and maybe turn off if you have a chance
to kind of adjust that there. Let's see, in Search Console,
we have search appearance filter that shows AMP articles
that appear in search.
That's all fine. But the URLs that appear
under AMP article filter are non-AMP URLs. Does Search Console show the
canonical for those URLs, or is this a reporting issue? So in general, in
Search Console, we do try to show the canonical
URLs in the performance report. It's not 100% perfect, because
there are certain kinds of URLs that we report on
slightly differently. But we do try to show, for the
most part, the canonical URL. So I would not be
surprised if you see the canonical URL in
the general performance report on AMP. I believe there's also a
separate AMP report, which you can double check.
But I'm not sure if that's
there or not, I don't know. Confused right now. But that might also
be one place to check. But in any case, I
would not see that as kind of an
issue on your side. If we're showing the
canonical URLs there that's kind of
the way we want it to do that in Search Console. And, again, it's not 100%
consistent because some types of search appearances, we show
the way that we show them.
Sometimes they also
include the hash in the URL, if we're
kind of including a link to a specific
section of a page. So it's– I would say it's
in large part canonical, but not 100% sure. Let's see, question
about crawling. I recently redesigned
my website and changed the way I list my blog
posts and other pages from pages 1, 2, 3, 4,
to a View More button.
Can Google still crawl
the ones that are not shown on the main blog page? What is the best practice? If not, let's say those
pages are not important when it comes to
search and traffic, would the whole
site as a whole be affected when it comes
to how relevant it is for the topic for Google? So on the one hand,
it depends a bit on how you have
that implemented. A View More button could be
implemented as a button that does something with JavaScript. And those kind of
buttons we would not be able to crawl
through and actually see more content there. On the other hand,
you could also implement a View More
button essentially as a link to kind of page 2 of
those results, or from page 2 to page 3. And if it's
implemented as a link, we would follow
it as a link, even if it doesn't have like a
label that says page 2 on it.
So that's, I think, the
first thing to double check, is it actually something
that can be crawled or not? And with regards to like
if it can't be crawled, then usually what
would happen here is we would focus
primarily on the blog posts that would be linked
directly from those pages. And I mean, it's something
where we probably would keep the old blog
posts in our index, because we've seen them and
indexed them at some point. But we will probably
focus on the ones that are currently there. One way you can help
to mitigate this is if you cross-link
your blog posts as well. So sometimes that is done with
category pages or these tag pages that people add. Sometimes blogs have a mechanism
for linking to related blog posts. And all of those
kind of mechanisms add more internal
linking to a site, and make it possible that
even if we initially just see the first page of the
results from your blog, we would still be able to crawl
to the rest of your website.
And one way you can double-check
this is to use a local crawler. There are various third party
crawling tools available. And if you crawl your
website and you see that, oh, it only picks
up five blog posts, then probably those are the five
blog posts that are findable. On the other hand, if it goes
through those five blog posts and then finds a bunch
more and a bunch more, then you can be pretty
sure that Googlebot will be able to crawl the
rest of the site as well.
To what degree does Google
honor the robots.txt? I'm working on a new
version of my website that's currently blocked
with the robots file. And I intend to use robots.txt
to block indexing of some URLs that are important
for usability, but not for search engines. So I want to understand
if that's OK? That's perfectly fine. So when we recognize disallow
entries in a robots.txt file, we will absolutely follow those. The only kind of situation I've
seen where that did not work is where we were
not able to process the robots.txt file properly. But if we can process the
robots.txt file properly, if it's properly
formatted, then we will absolutely stick to that
when it comes to crawling. Another caveat
there is usually we update the robots.txt files
maybe once a day, depending on the website. So if you change your
robots.txt file now, it might take a day
until it takes effect. With regards to
blocking, crawling– so you mentioned
blocking indexing, but essentially robots.txt
file would block crawling. So if you blocked crawling
of pages that are important for usability but not for search
engines, usually that's fine.
What would happen
or could happen is that we would index the
URL without the content. So if you do a site query
for those specific URLs, you would still see it. But if the content is
on your crawlable pages, then for any normal
query that people do when they search for a
specific term on your pages, we will be able to
focus on the pages that are actually indexed
and crawled and show those in the search results. So from that point of
view, that's all fine. Let's see, for
example, fireworks is one of the
products and services that Google considers dangerous
and that you should avoid in your ads and destinations. Do such terms also affect the
ranking of a business website that legally offers
such products? So from a Google
search point of view, I'm not aware of
a list of products that we would consider
to be problematic.
I do believe, on
the Google ad side, there are certain
products where the ads team, for whatever
policy reason, doesn't want to allow
advertising for. But that's completely
separate from Google Search. And at least from
what I'm aware of, if you mentioned these
things on your website, they can appear in the
normal search results. There's nothing
really blocking that. And you definitely
see that if you just try to search for fireworks. You will find sites
that sell you fireworks.
Let's see. We're about to launch
a new e-commerce site and we're debating on how many
pages we should initially go live with. Since it's a new
domain, I'm worried about that Google will
only index a small portion of the website at first. What would be the
best approach here? So I think we have
this covered in our e-commerce documentation. Actually we looked at this
with a variety of teams. And I think everyone
had a different idea of which approach to take. And I think we have
maybe three options or so covered in
the documentation. So I will double check that. It depends a little bit on
what you want to achieve and how that should happen. Also with regards
to e-commerce sites, if you're trying to get these
products into Google Merchant Center for Google Shopping– I think it's still
called Google Shopping– then you might have
different considerations than if you're just launching
a website on its own.
But the documentation that
we have for e-commerce sites, I would definitely
check that out. Let's see. There's just one question
I want to grab quickly before I switch
over to you folks again, because
you're so patient. Thank you. And it's a long question, but
I'll just take the first part to make it even easier. So it's a food blogger
and the first question is, Google said that there's
a maximum of 16 words that you can use
in your alt-text.
And the question is, does Google
read the rest of my alt-text and what does this
mean for usability? And I think the
important part here is we don't have any guidelines
with regards to how long your alt-text can be. So from a Google
search point of view, you can put a lot of things
in the alt-text for an image if that's relevant for
that particular image.
When it comes to the
alt-text, we primarily use that to better
understand the image. So if someone is searching for,
I don't know, in Google Images for something that kind
of matches the alt-text, then we can use
that to understand that your image is
relevant for that alt-text on that specific page. That's kind of the primary
use case of the alt-text. We do also use the alt-text
as a part of the page. But to me, that's
usually something that is already visible
on the page anyway. So it's less something that is
critical to the page itself.
So I would really
use it as something that applies to the image. And I would use it for usability
reasons, and for Google Images to better understand
that specific image. And I think what might
also be worth mentioning is when it comes
to Google Images, you don't necessarily need
to describe exactly what is in the image, but rather kind
of like what this image means for your particular page. So if you have a
picture of a beach, you could use an alt-text
and say, oh, this is a beach.
But you could also say
this is the beach in front of our hotel. Or this is the
beach that we took a photo of when we were
doing a chemical cleanup. And those intents
are very different and people would be searching in
different ways in Google Images to find more information there. And kind of giving
that extra context also always makes sense,
in my opinion. OK, lots of questions left. I'll try to add some comments
as replies there as well. But maybe we'll switch over
to some live questions. And I have a bit more time
afterwards, too, if any of you want to stick around. Let's see, Shun. I think you're up next. AUDIENCE: Hi. John. JOHN MUELLER: Hi. AUDIENCE: Hi. I have a question that will be
a little bit similar to the one you answered earlier, but it's
regarding HTML and JavaScript, the classic one.
And I have, let's
say I have an HTML with there are some parts of
which is hidden by CSS display, no? And when a user first
lands on a page that is hidden for some reason, UI
reason, to simplify the UI, and then that hidden
contents can be only seen when a user click the button. And then JavaScript runs, just
the JavaScript quick events runs. And JavaScript changed
the status of the CSS. And then the user
sees the contents. So my question is, it looks
a little bit gray zone for me because that since
the Google can never run any JavaScript events. So I wonder if the contents
which as is hidden by CSS when I use a process
to land on a page, can be the target of the
Google evaluation or not? JOHN MUELLER: It can
still be indexed. So that's something, if
it's in the HTML itself, in the Dom when
the page is loaded, then we can use
that for indexing.
If it's something that needs a
JavaScript event to then fetch something from the server
and then display that, then that's not something
that we would recognize. But if it's in the Dom,
if it's in the HTML and it just goes from
being hidden to visible, that's perfectly fine. AUDIENCE: OK, so is
a variation would be the same as the one
that is not hidden, or– JOHN MUELLER: Probably. I mean, it's always
hard to compare like how things will rank in the end. But my assumption is that it
would be pretty much the same. I think it's also something
from a user point of view you might want to watch out for. And that's more like when
someone goes to your web page, and after it's being
shown in search.
And that is essentially that
if you're promising the user something, it's a
good idea to show it, show the user that when
they go to your web page. So we apply that when
it comes to things like intrusive interstitials
or kind of like too many ads on a page. If they go to your
page, they should be able to find what
they were looking for. And if this is additional
content that you're providing, which is not the primary reason
why they're going to your page, then that's fine. That's kind of like
a usability way to give more
functionality to a page. However, if the primary
content is blocked like this, then that's something where I
would expect the users to be a little bit unhappy. If they go there and
they don't realize, oh, this is how I get
that piece of information. AUDIENCE: OK, so
what could the case that only when we see
the contents to the user from the PC, and never
can see the contents from the smartphone? With actually the element
itself is embedded on HTML, but hidden.
And the user can never have
a chance to undo the hidden. JOHN MUELLER: I don't think
we would separate that out, because sometimes that's
also just a usability mechanism that sites use,
where you have essentially a responsive design. And within the responsive design
setup for certain screen sizes, you hide something
like the sidebar. And from our point
of view, that's fine. AUDIENCE: OK, thank you. OK, John. Thank you JOHN MUELLER: Cool, OK. Joy. AUDIENCE: Hi, John. I have two questions. Hopefully, it won't
take up too much time. So the first is there's
a portion of content. JOHN MUELLER: It's
really hard to hear you.
Sorry. Can you perhaps be
closer to a microphone? AUDIENCE: Is that better? JOHN MUELLER: Yes, perfect. AUDIENCE: Sorry about that. I have two questions that I
hope won't take too much time. So does a portion of content
created by a publisher matter? And I mean that in
the sense of affiliate or maybe even sponsored content. Context is there's a "Digiday"
newsletter that went out today that mentioned that
publishers were concerned that if you have, let's
say, 40% of your traffic or content as
commerce or affiliate, your website will become
or considered by Google a deals website. And then your authority
may be dinged a little bit.
Is there such a thing
that's happening in the ranking systems
algorithmically? JOHN MUELLER: I
don't think we would have any threshold like
that, partially because it's really hard to determine
a threshold like that. You can't, for example, just
take the number of pages and say this is
this type of website because it has 50%
pages like that, because the pages can be
visible in very different ways. So sometimes you have a lot
of pages that nobody sees. And it wouldn't make
sense to judge a website based on something
that essentially doesn't get shown to users. AUDIENCE: And I guess this
may be like a part B to that. How much does it matter if
a publisher is outsourcing content for scalability reasons,
versus having content created by staff or staff writers? JOHN MUELLER: I don't
think for the most part that we would differentiate.
It's more about the quality
of the content overall. So that's something where,
if you outsource the content and then you get
good content back, then you publish
that good content. So from that point
of view, I wouldn't say that outsourced content
versus in-house content is kind of different
by definition when it comes to
the overall quality. ROBB YOUNG: How
would you even know? JOHN MUELLER: I don't know. Yeah. AUDIENCE: I think there are ways
that Google would pick that up, especially if you label that. I think that's pretty
easy to pick up. ROBB YOUNG: I contest
that, if you're talking about Google
understanding an employment contract, at that point. If I've got a staff member
sitting in my office full time, or one sitting at home
being paid by the hour, I don't see how you would ever
know who had written anything. AUDIENCE: Fair enough. ROBB YOUNG: I could be wrong. I have been before. JOHN MUELLER: I
don't think, yeah– I mean, it would be different
if you're aggregating content from other sites. That's something which
we could pick up on.
But if it's really just
someone in-house or someone from an agency writing
the content for you, it's the content that
you're publishing. AUDIENCE: OK, all right. Thank you, appreciate it. JOHN MUELLER: Cool. OK, let me take a break here
and pause the recording. It's been good
having you all here. Thank you all for joining. Thanks for submitting
so many questions. I have a bit more
time afterwards, so we can go through
the millions of people with raised hands as well.
And we can see what
we end up with. Thanks for joining. If you're watching
this on YouTube, you're welcome to join one
of the next "Office Hour Hangouts." We do these about weekly. Usually I announce them a
couple of days ahead of time in the Community section. You can drop your
questions there or you can watch
out for the link and join us live if you'd like. All right. And with that, let me
find the button to pause. [CLICK].